Before we really start


Why using R for genetics/genomics?

  • you will need to learn R anyhow to analyse non genetic data
  • you may not want to learn how to use 50 different pieces of software with various syntax
  • you may want to integrate your analysis into a larger workflow (e.g. statistical analysis, plots)

Outcome

  • gain familiarity with R
  • understand how microsatellite data can be handled in R
  • perform basic genetic analyses using the packages adegenet, poppr and pegas
  • use R Markdown to produce a report of the analyses, incorporating the workflow and results

Homework

  • analyse and interpret a Eurasian lynx dataset & produce a report (PDF) as part of course evaluation

About this document

This document is an R Markdown document. An R Markdown document is the combination of a Markdown document with chunks of R code in the middle that are evaluated using an R package called knitr.

R Markdown document can be used interactively or they can be turned into beautiful HTML document or other (e.g. PDF, Word, …) if you click on the button “Knit” on top.

The benefits of using an R Markdown document is that you have your text, scripts, results, plots and tables in a single place.

You can edit the current R Markdown file to add your own notes to it (but save it under a different name, so that you can go back to the original version if you need to) and to try out or even modify the R code.

If you prefer to see pure R code only, just do:

# file.remove("R4G_course.R") # uncomment and run to remove existing file
knitr::purl("R4G_course.Rmd")

Note: it will create a file called R4G_course.R but it won’t overwrite it if it already exists. So, if you want some changes to be taken into account, make sure you delete the R file beforehand (e.g., by running the first line in the chunk above).

Basic Markdown syntax

Just write plain text in the script. You can use different number of stars to indicate:

  • *italic* -> italic
  • *bold* -> bold
  • **both** -> both

Basic knitr syntax

Use ```{r} to start an R chunk and ``` to close it.

For example if you type 1 + 1 within a chunk, it will be displayed as:

1 + 1
## [1] 2

Then, to evaluate the chunk, just press CTRL-R after having put your mouse cursor inside the chunk (anywhere).

You can learn more about R Markdown on this online book or if you already know a little, the R Studio cheatsheet will help you to keep it all in mind.


Getting all the R packages ready

We will use the R packages adegenet, poppr and pegas for the analyses and ggplot2, lattice and viridisLite for the plots. If you have already installed before, you can simply load the packages as follow:

library(adegenet)
library(poppr)
library(pegas)
library(ggplot2)
library(lattice)
library(viridisLite)

Note 1: If one of the package is missing, then you must install it BEFORE loading them. So you may need to use, for ex:

install.packages("adegenet")

Note 2: To be able to “knit” the R Markdown, you will also need other R packages but try knitting and R Studio should do the rest for you.

We also add a few extra functions we coded for you:

source("scripts/tools.R")
## [1] "All functions succesfully loaded"

Importing the data into R


The genepop format

We will be using a very common type of input file for microsatellite data, which was originally developed for a stand-alone program called Genepop. You will learn how to create a genepop file with Jörns tomorrow. Today we will use one that we have made for you.

Here is the basic structure:

comment line
          locus-1, locus-2, locus-3, ... 
pop
Ind-1  ,   100102   135135   204208  ...
Ind-2  ,   100100   131139   200208  ...
Ind-3  ,   102102   131139   200204  ...
Ind-4  ,   000000   135139   208208  ...
...
  1. the first line is a “comment line” - you can keep notes on your project here. E.g. “24 animals, 8 loci”

  2. the second line contains the names of the loci (locus-1, etc).

  3. the third line has a “pop flag” that indicates the next samples belong to the same population.

  4. below this (until the next pop flag), every line is a multi-locus genotype per individual belonging to the population

    • the first column with sample ID (followed by a comma)
    • the following columns with the genotype at a given locus (the two alleles, without a separator)
    • missing data at a locus is denoted by “000000”

Is everyone comfortable with the terms locus, allele and genotype?


Reading your data into R

In the folder data in this R Studio project you can find the genepop file called microbov_small.gen.

We will create an R object that contains the data in microbov_small.gen using the read.genepop() function of adegenet. Here we need to provide the path of the file as "/data/microbov_small.gen" using file =, and some information about how microsatellite data is encoded in the file using ncode =. (And here we also set the argument quiet = but you don’t have to, this is just to shorten this document by not displaying some messages.)

myData <- read.genepop(file = "data/microbov_small.gen", ncode = 3, quiet = TRUE)

Why did we use ncode = 3?

Now the data is read into R.

If there had been a problem with the data file (e.g. incorrectly formatted), then we would have got an error message. Let’s try:

badFormat <- read.genepop(file = "data/badFormatting.gen", ncode = 3, quiet = TRUE)
## Error in dimnames(x) <- dn: length of 'dimnames' [1] not equal to array extent

We “broke” this input file by including fewer locus names than data. The error message is telling us that the length of dimnames [1] (= number of locus names) does not correspond to the number of columns containing the data.

As you can see, R provides us a clue about what the problem is.

Another way we can generate an error is by providing wrong arguments to the function, despite having a properly formatted file:

badInput <- read.genepop(file = "data/microbov_small.gen", ncode = 2, quiet = TRUE)
## Error in read.genepop(file = "data/microbov_small.gen", ncode = 2, quiet = TRUE): some alleles are not encoded with 2 characters
## Check 'ncode' argument

Unsurprisingly, there will be an error if you type the path to the file incorrectly.

One way to limit such issues is to list all genepop files in the folder containing the data (to make sure that the file is where you are trying to read it):

dir(path = "data/", pattern = "*.gen")
## [1] "badFormatting.gen"  "lynx.gen"           "microbov_small.gen" "nancycats.gen"

R representations of your data

The genind format

We haven’t looked at our in data yet!

myData
## /// GENIND OBJECT /////////
## 
##  // 160 individuals; 15 loci; 143 alleles; size: 130.5 Kb
## 
##  // Basic content
##    @tab:  160 x 143 matrix of allele counts
##    @loc.n.all: number of alleles per locus (range: 5-14)
##    @loc.fac: locus factor for the 143 columns of @tab
##    @all.names: list of allele names for each locus
##    @ploidy: ploidy of each individual  (range: 2-2)
##    @type:  codom
##    @call: read.genepop(file = "data/microbov_small.gen", ncode = 3, quiet = TRUE)
## 
##  // Optional content
##    @pop: population of each individual (group size range: 15-31)

Geeky note: For more details on the content of the genind object, just try:

str(myData)
## Formal class 'genind' [package "adegenet"] with 11 slots
##   ..@ tab      : int [1:160, 1:143] 2 1 1 2 1 1 0 2 1 2 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:160] "AFBIBOR9503" "AFBIBOR9504" "AFBIBOR9505" "AFBIBOR9506" ...
##   .. .. ..$ : chr [1:143] "INRA63.183" "INRA63.181" "INRA63.177" "INRA63.175" ...
##   ..@ loc.fac  : Factor w/ 15 levels "INRA63","INRA5",..: 1 1 1 1 1 1 1 2 2 2 ...
##   ..@ loc.n.all: Named int [1:15] 7 5 12 5 10 9 7 7 12 9 ...
##   .. ..- attr(*, "names")= chr [1:15] "INRA63" "INRA5" "ETH225" "ILSTS5" ...
##   ..@ all.names:List of 15
##   .. ..$ INRA63: chr [1:7] "183" "181" "177" "175" ...
##   .. ..$ INRA5 : chr [1:5] "137" "141" "143" "139" ...
##   .. ..$ ETH225: chr [1:12] "147" "157" "139" "141" ...
##   .. ..$ ILSTS5: chr [1:5] "190" "186" "194" "184" ...
##   .. ..$ HEL5  : chr [1:10] "149" "151" "163" "165" ...
##   .. ..$ HEL1  : chr [1:9] "103" "109" "105" "107" ...
##   .. ..$ INRA35: chr [1:7] "102" "104" "120" "110" ...
##   .. ..$ ETH152: chr [1:7] "191" "195" "197" "193" ...
##   .. ..$ INRA23: chr [1:12] "213" "215" "197" "199" ...
##   .. ..$ ETH10 : chr [1:9] "209" "211" "207" "217" ...
##   .. ..$ HEL9  : chr [1:11] "153" "159" "165" "155" ...
##   .. ..$ CSSM66: chr [1:12] "181" "189" "185" "193" ...
##   .. ..$ INRA32: chr [1:12] "162" "178" "180" "184" ...
##   .. ..$ ETH3  : chr [1:11] "117" "125" "115" "127" ...
##   .. ..$ BM2113: chr [1:14] "140" "142" "122" "134" ...
##   ..@ ploidy   : int [1:160] 2 2 2 2 2 2 2 2 2 2 ...
##   ..@ type     : chr "codom"
##   ..@ other    : NULL
##   ..@ call     : language read.genepop(file = "data/microbov_small.gen", ncode = 3, quiet = TRUE)
##   ..@ pop      : Factor w/ 7 levels "AFBIBOR9522",..: 1 1 1 1 1 1 1 1 1 1 ...
##   ..@ strata   : NULL
##   ..@ hierarchy: NULL

or for even more geeky details:

print.AsIs(myData)

As you will see, the genind object is a particular kind of list. Technically it is called a S4 object. Instead of accessing elements with $, when using S4 objects elements (called slots) are typically accessible with @ (even if the programmer behind adegenet made it possible to use $ anyway).

Since manipulating such objects is a little complicated, you should use the accessors, whenever available:

nInd(myData) # Number of individuals
## [1] 160
nLoc(myData) # Number of loci
## [1] 15
nPop(myData) # Number of populations
## [1] 7
locNames(myData) # Names of loci
##  [1] "INRA63" "INRA5"  "ETH225" "ILSTS5" "HEL5"   "HEL1"   "INRA35" "ETH152" "INRA23" "ETH10"  "HEL9"   "CSSM66" "INRA32" "ETH3"   "BM2113"
alleles(myData) # List of all alleles
## $INRA63
## [1] "183" "181" "177" "175" "179" "185" "171"
## 
## $INRA5
## [1] "137" "141" "143" "139" "149"
## 
## $ETH225
##  [1] "147" "157" "139" "141" "153" "149" "155" "143" "159" "151" "137" "145"
## 
## $ILSTS5
## [1] "190" "186" "194" "184" "182"
## 
## $HEL5
##  [1] "149" "151" "163" "165" "167" "155" "157" "153" "161" "159"
## 
## $HEL1
## [1] "103" "109" "105" "107" "101" "117" "113" "111" "115"
## 
## $INRA35
## [1] "102" "104" "120" "110" "108" "114" "106"
## 
## $ETH152
## [1] "191" "195" "197" "193" "201" "199" "205"
## 
## $INRA23
##  [1] "213" "215" "197" "199" "209" "205" "211" "203" "207" "217" "201" "193"
## 
## $ETH10
## [1] "209" "211" "207" "217" "219" "221" "215" "223" "213"
## 
## $HEL9
##  [1] "153" "159" "165" "155" "161" "163" "167" "157" "149" "169" "151"
## 
## $CSSM66
##  [1] "181" "189" "185" "193" "197" "183" "199" "187" "179" "195" "171" "191"
## 
## $INRA32
##  [1] "162" "178" "180" "184" "202" "182" "176" "174" "164" "160" "204" "168"
## 
## $ETH3
##  [1] "117" "125" "115" "127" "103" "123" "129" "131" "119" "109" "113"
## 
## $BM2113
##  [1] "140" "142" "122" "134" "136" "146" "138" "130" "124" "150" "128" "132" "126" "144"
nAll(myData, onlyObserved  = TRUE) # Number of alleles
## INRA63  INRA5 ETH225 ILSTS5   HEL5   HEL1 INRA35 ETH152 INRA23  ETH10   HEL9 CSSM66 INRA32   ETH3 BM2113 
##      7      5     12      5     10      9      7      7     12      9     11     12     12     11     14
indNames(myData) # Names of individuals
##   [1] "AFBIBOR9503"  "AFBIBOR9504"  "AFBIBOR9505"  "AFBIBOR9506"  "AFBIBOR9507"  "AFBIBOR9508"  "AFBIBOR9509"  "AFBIBOR9510"  "AFBIBOR9511"  "AFBIBOR9512" 
##  [11] "AFBIBOR9513"  "AFBIBOR9514"  "AFBIBOR9515"  "AFBIBOR9516"  "AFBIBOR9517"  "AFBIBOR9518"  "AFBIBOR9519"  "AFBIBOR9520"  "AFBIBOR9523"  "AFBIBOR9521" 
##  [21] "AFBIBOR9522"  "AFBIZEB9453"  "AFBIZEB9454"  "AFBIZEB9455"  "AFBIZEB9456"  "AFBIZEB9457"  "AFBIZEB9458"  "AFBIZEB9459"  "AFBIZEB9460"  "AFBIZEB9461" 
##  [31] "AFBIZEB9462"  "AFBIZEB9463"  "AFBIZEB9464"  "AFBIZEB9465"  "AFBIZEB9466"  "AFBIZEB9467"  "AFBIZEB9468"  "AFBIZEB9469"  "AFBIZEB9470"  "AFBIZEB9471" 
##  [41] "AFBIZEB9472"  "AFBIZEB9473"  "AFBIZEB9474"  "AFBIZEB9475"  "AFBIZEB9476"  "AFBIZEB9477"  "AFBIZEB9478"  "AFBIZEB9479"  "AFBIZEB9480"  "AFBIZEB9481" 
##  [51] "AFBIZEB9482"  "AFBIZEB9483"  "AFBTLAG9402"  "AFBTLAG9403"  "AFBTLAG9404"  "AFBTLAG9405"  "AFBTLAG9406"  "AFBTLAG9407"  "AFBTLAG9408"  "AFBTLAG9409" 
##  [61] "AFBTLAG9410"  "AFBTLAG9411"  "AFBTLAG9412"  "AFBTLAG9413"  "AFBTLAG9414"  "AFBTLAG9415"  "AFBTLAG9416"  "AFBTND202"    "AFBTND205"    "AFBTND206"   
##  [71] "AFBTND207"    "AFBTND208"    "AFBTND209"    "AFBTND211"    "AFBTND212"    "AFBTND213"    "AFBTND214"    "AFBTND215"    "AFBTND216"    "AFBTND217"   
##  [81] "AFBTND221"    "AFBTND222"    "AFBTND223"    "AFBTND233"    "AFBTND241"    "AFBTND242"    "AFBTND244"    "AFBTND248"    "AFBTND253"    "AFBTND254"   
##  [91] "AFBTND255"    "AFBTND257"    "AFBTND258"    "AFBTND259"    "AFBTND284"    "AFBTND285"    "AFBTND292"    "AFBTSOM9352"  "AFBTSOM9353"  "AFBTSOM9354" 
## [101] "AFBTSOM9355"  "AFBTSOM9356"  "AFBTSOM9357"  "AFBTSOM9358"  "AFBTSOM9359"  "AFBTSOM9360"  "AFBTSOM9361"  "AFBTSOM9362"  "AFBTSOM9363"  "AFBTSOM9364" 
## [111] "AFBTSOM9365"  "AFBTSOM9366"  "AFBTSOM9367"  "AFBTSOM9368"  "AFBTSOM9369"  "AFBTSOM9370"  "AFBTSOM9371"  "AFBTSOM9372"  "AFBTSOM9373"  "AFBTSOM9374" 
## [121] "FRBTAUB9061"  "FRBTAUB9062"  "FRBTAUB9063"  "FRBTAUB9064"  "FRBTAUB9065"  "FRBTAUB9066"  "FRBTAUB9067"  "FRBTAUB9068"  "FRBTAUB9069"  "FRBTAUB9070" 
## [131] "FRBTAUB9071"  "FRBTAUB9072"  "FRBTAUB9073"  "FRBTAUB9074"  "FRBTAUB9075"  "FRBTAUB9076"  "FRBTBAZ15654" "FRBTBAZ15655" "FRBTBAZ25576" "FRBTBAZ25578"
## [141] "FRBTBAZ25950" "FRBTBAZ25954" "FRBTBAZ25956" "FRBTBAZ25957" "FRBTBAZ26078" "FRBTBAZ26092" "FRBTBAZ26097" "FRBTBAZ26110" "FRBTBAZ26112" "FRBTBAZ26352"
## [151] "FRBTBAZ26354" "FRBTBAZ26375" "FRBTBAZ26386" "FRBTBAZ26388" "FRBTBAZ26396" "FRBTBAZ26400" "FRBTBAZ26401" "FRBTBAZ26403" "FRBTBAZ26439" "FRBTBAZ26457"
popNames(myData) # Name of the last individual in each population
## [1] "AFBIBOR9522"  "AFBIZEB9483"  "AFBTLAG9416"  "AFBTND292"    "AFBTSOM9374"  "FRBTAUB9076"  "FRBTBAZ26457"

Some of the accessors can also be used to redefine some information (to handle with great care):

#let's give the pops some names:

#myPops <- c("Pop1", "Pop2", "Pop3", "Pop4", "Pop5", "Pop6", "Pop7")
myPops <- paste0("Pop", 1:nPop(myData))
popNames(myData) <- myPops
popNames(myData)
## [1] "Pop1" "Pop2" "Pop3" "Pop4" "Pop5" "Pop6" "Pop7"
pop(myData)
##   [1] Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop1 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2
##  [32] Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop2 Pop3 Pop3 Pop3 Pop3 Pop3 Pop3 Pop3 Pop3 Pop3 Pop3
##  [63] Pop3 Pop3 Pop3 Pop3 Pop3 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4 Pop4
##  [94] Pop4 Pop4 Pop4 Pop4 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop5 Pop6 Pop6 Pop6 Pop6
## [125] Pop6 Pop6 Pop6 Pop6 Pop6 Pop6 Pop6 Pop6 Pop6 Pop6 Pop6 Pop6 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7 Pop7
## [156] Pop7 Pop7 Pop7 Pop7 Pop7
## Levels: Pop1 Pop2 Pop3 Pop4 Pop5 Pop6 Pop7

The loci format

Not all R packages dealing with genetics/genomics use genind objects (it would be convenient if they did!). In particular, the package pegas which allows for the computation of many useful metrics relies on an alternative format called loci.

myData_loci <- genind2loci(myData) # or as.loci(myData)
myData_loci

The loci format is more simple than the geneind format (and so what you can do with it is more limited).


Geeky note: A loci object is simply a traditional data.frame (an S3 object) with an additional attribute called "locicol":

str(myData_loci)
## Classes 'loci' and 'data.frame': 160 obs. of  16 variables:
##  $ population: Factor w/ 7 levels "Pop1","Pop2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ INRA63    : Factor w/ 18 levels "171/175","175/175",..: 17 16 10 17 10 10 9 17 10 17 ...
##  $ INRA5     : Factor w/ 11 levels "137/137","137/139",..: 3 9 9 9 9 4 6 6 6 9 ...
##  $ ETH225    : Factor w/ 29 levels "137/147","139/139",..: 19 8 2 10 25 23 19 26 3 28 ...
##  $ ILSTS5    : Factor w/ 13 levels "182/184","182/186",..: 11 8 13 6 5 5 6 5 3 8 ...
##  $ HEL5      : Factor w/ 28 levels "149/149","151/151",..: 1 6 7 28 17 17 27 26 26 28 ...
##  $ HEL1      : Factor w/ 22 levels "101/101","101/103",..: 7 10 4 5 6 5 4 6 5 5 ...
##  $ INRA35    : Factor w/ 13 levels "102/102","102/104",..: 2 4 4 4 4 4 4 2 4 4 ...
##  $ ETH152    : Factor w/ 14 levels "191/191","191/195",..: 2 7 8 7 1 8 7 8 2 8 ...
##  $ INRA23    : Factor w/ 32 levels "193/199","197/199",..: 31 3 4 11 11 4 26 20 19 9 ...
##  $ ETH10     : Factor w/ 25 levels "207/207","207/209",..: 8 5 3 12 8 11 7 11 6 6 ...
##  $ HEL9      : Factor w/ 29 levels "149/157","151/161",..: 3 6 9 14 6 18 9 4 4 23 ...
##  $ CSSM66    : Factor w/ 34 levels "171/183","179/179",..: 10 10 8 33 24 18 22 20 20 20 ...
##  $ INRA32    : Factor w/ 27 levels "160/204","162/162",..: 2 2 19 2 6 27 19 21 5 7 ...
##  $ ETH3      : Factor w/ 30 levels "103/117","109/117",..: 10 13 8 10 9 10 9 6 25 5 ...
##  $ BM2113    : Factor w/ 43 levels "122/122","122/124",..: 41 40 1 41 30 41 28 30 7 37 ...
##  - attr(*, "locicol")= int [1:15] 2 3 4 5 6 7 8 9 10 11 ...
head(data.frame(myData_loci), n = 10) # first 10 rows
attr(myData_loci, "locicol") # columns that contain loci
##  [1]  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16

Checking your data

Let’s check some basic things about the data in myData.


Missing data

An important consideration for analysis is the amount of missing data in our dataset. Several types of analyses don’t cope very well with missing data. In some cases this may lead to poor estimates, in other cases it may lead to samples or loci not being considered in the analysis.

We can use info_table() from poppr to find out about missing data, and also generate a convenient plot by including plot = TRUE.

missing <- info_table(myData, df = TRUE) # df = TRUE uses a long rather than 
                                         # wide format for the df
missing[order(missing$Missing, decreasing = TRUE), ] # We reorder the table to show the highest missing on top
info_table(myData, plot = TRUE, low = "white")

##           Locus
## Population INRA63  INRA5 ETH225 ILSTS5   HEL5   HEL1 INRA35 ETH152 INRA23  ETH10   HEL9 CSSM66 INRA32   ETH3 BM2113   Mean
##      Pop1       .      .      . 0.0476      .      .      .      .      .      .      .      .      .      .      . 0.0032
##      Pop2       .      .      .      . 0.0323      .      .      . 0.0645      .      .      . 0.0323      .      . 0.0086
##      Pop3       .      .      . 0.3333      .      .      . 0.0667      .      .      .      .      .      .      . 0.0267
##      Pop4       .      .      . 0.1333      . 0.0333 0.0333      . 0.0333      .      .      .      .      .      . 0.0156
##      Pop5       .      . 0.0435 0.1739      .      .      .      .      .      .      .      .      . 0.0435      . 0.0174
##      Pop6       .      .      . 0.2500      .      .      .      .      .      .      .      .      .      .      . 0.0167
##      Pop7  0.1667 0.2917      . 0.2500 0.0833 0.2083      . 0.2917 0.0417 0.3333 0.0417 0.2917 0.3750 0.3750 0.2083 0.1972
##      Total 0.0250 0.0437 0.0063 0.1500 0.0187 0.0375 0.0063 0.0500 0.0250 0.0500 0.0063 0.0437 0.0625 0.0625 0.0312 0.0413

Geeky note: it is tricky but you can modify anything in this plot even when info_table() as no option for it:

theplot <- ggplot_build(last_plot())
theplot$data[[4]]$size <- 3
theplot$data[[4]]$angle <- 45
plot(ggplot_gtable(theplot))

We can see that there is missing data at several loci, and that this is particularly bad for our population Pop7.

As a general rule of thumb, we try to keep missing data below 5% in order to minimize the impact on analyses.

How do you think we can achieve this?

Using genotype_curve() we can check how many loci we need in order to discriminate our genotypes. This poppr function randomly samples loci without replacement and counts the number of multilocus genotypes observed.

gencurv <- genotype_curve(myData)

How many loci do we need in order to discriminate our genotypes?

Geeky note: you can get the exact numbers:

apply(gencurv, 2, range)
##       NumLoci
##         1   2   3   4   5   6   7   8   9  10  11  12  13  14
##   [1,] 12  44  84 135 150 154 155 156 157 157 157 157 157 157
##   [2,] 44 131 154 157 157 157 157 157 157 157 157 157 157 157

As missing data is concentrated in a few loci and populations, rather than being thinly spread throughout the dataset, we can remove the worst offenders: a locus (ILSTS5) and a population (Pop7).

We can ‘drop’ a population from the genind object using the following code, if we know the number (ID) of the population in the list of populations.

myData[pop = -c(7), drop = TRUE] # drop = TRUE updates the number of remaining alleles

Sometimes it is more useful to be able to remove populations by their name. For this we need a bit more code:

removePop <- c("Pop7")
myData <- myData[pop = !popNames(myData) %in% removePop, drop = TRUE]

Removing loci from a genind object works in a similar fashion. If you know the number (ID) of a locus, then you can use the short code myData[loc = -c(1)]. But again, it is better to use names.

removeLoc <- c("ILSTS5")
myData <- myData[loc = !locNames(myData) %in% removeLoc, drop = TRUE]

Now let’s check the new version of myData:

myData
## /// GENIND OBJECT /////////
## 
##  // 136 individuals; 14 loci; 137 alleles; size: 111.1 Kb
## 
##  // Basic content
##    @tab:  136 x 137 matrix of allele counts
##    @loc.n.all: number of alleles per locus (range: 5-14)
##    @loc.fac: locus factor for the 137 columns of @tab
##    @all.names: list of allele names for each locus
##    @ploidy: ploidy of each individual  (range: 2-2)
##    @type:  codom
##    @call: .local(x = x, i = i, j = j, loc = ..1, drop = drop)
## 
##  // Optional content
##    @pop: population of each individual (group size range: 15-31)

LEt’s also rerun info_table() to see what’s changed:

info_table(myData, plot = TRUE, low = "white")

##           Locus
## Population INRA63 INRA5 ETH225   HEL5   HEL1 INRA35 ETH152 INRA23 ETH10 HEL9 CSSM66 INRA32   ETH3 BM2113   Mean
##      Pop1       .     .      .      .      .      .      .      .     .    .      .      .      .      .      .
##      Pop2       .     .      . 0.0323      .      .      . 0.0645     .    .      . 0.0323      .      . 0.0092
##      Pop3       .     .      .      .      .      . 0.0667      .     .    .      .      .      .      . 0.0048
##      Pop4       .     .      .      . 0.0333 0.0333      . 0.0333     .    .      .      .      .      . 0.0071
##      Pop5       .     . 0.0435      .      .      .      .      .     .    .      .      . 0.0435      . 0.0062
##      Pop6       .     .      .      .      .      .      .      .     .    .      .      .      .      .      .
##      Total      .     . 0.0074 0.0074 0.0074 0.0074 0.0074 0.0221     .    .      . 0.0074 0.0074      . 0.0053

We have significantly reduced the amount of missing data in myData.

How else could we have solved this?

How about our ability to discriminate genotypes?

genotype_curve(myData)

How does that compare to the previous genotype curve?

How does the maximum value for MLG compare to the number of individuals?


MLG (MultiLocus Genotypes)

Depending on how we obtain our genetic samples, we may not know if we have genotyped the same individual more than once.

For example, if we have been genotyping samples collected in a non-invasive way (e.g. hair or faeces), then we don’t really have a way of knowing if we have obtained more than one sample per individual.

There is an R package called alleleMatch that can investigate this in detail. However, it is no longer maintained. Thankfully, poppr has some basic functionality to examine this. Using mlg() we can check for the number of unique multilocus genotypes in myData.

mlg(myData)
## #############################
## # Number of Individuals:  136 
## # Number of MLG:  133 
## #############################
## [1] 133

We have 136 samples, but only 133 unique multilocus genotypes. This suggests that 3 samples correspond to replicates from the same individual(s). (We introduced that in the data on purpose to show you how to deal with such issues.)

Note: the mlg() function did not tell us which samples are the same. For this we can use mlg.id(). But this gives us a very long list, because it also includes the genotypes that occur only once. Let us check the beginning of the output from mlg.id() using head():

head(mlg.id(myData))
## $`1`
## [1] "AFBIZEB9466"
## 
## $`2`
## [1] "FRBTAUB9073"
## 
## $`3`
## [1] "AFBTSOM9356"
## 
## $`4`
## [1] "AFBIZEB9471"
## 
## $`5`
## [1] "FRBTAUB9067"
## 
## $`6`
## [1] "FRBTAUB9064"

What we want to know is which are the genotypes that occur more than once?

We achieve this by looking for entries in the list with a length greater than one.

myMLG <- mlg.id(myData)
myMLG[lengths(myMLG) > 1]
## $`104`
## [1] "AFBIZEB9482" "AFBIZEB9483"
## 
## $`120`
## [1] "AFBIBOR9520" "AFBIBOR9523"
## 
## $`125`
## [1] "AFBTSOM9373" "AFBTSOM9374"

As we do not want to include replicates of samples in our data, we should now remove one of each replicated genotype. We will remove AFBIZEB9483, AFBIBOR9523, and AFBTSOM9374

myData <- myData[!indNames(myData) %in% c("AFBIZEB9483", "AFBIBOR9523","AFBTSOM9374"), drop = TRUE]
myData
## /// GENIND OBJECT /////////
## 
##  // 133 individuals; 14 loci; 137 alleles; size: 109.1 Kb
## 
##  // Basic content
##    @tab:  133 x 137 matrix of allele counts
##    @loc.n.all: number of alleles per locus (range: 5-14)
##    @loc.fac: locus factor for the 137 columns of @tab
##    @all.names: list of allele names for each locus
##    @ploidy: ploidy of each individual  (range: 2-2)
##    @type:  codom
##    @call: .local(x = x, i = i, j = j, drop = drop)
## 
##  // Optional content
##    @pop: population of each individual (group size range: 15-30)

Now we have 133 unique genotypes in myData.

If this were data from non-invasively collected samples, we may expect there to be more duplicate genotypes that we have not detected yet. Why?


Geeky note:

If you have to remove a lot of samples, you can do it easily by replacing the previous call by:

samplesToKeep <- unlist(lapply(myMLG, function(x) x[1]))
myData <- myData[indNames(myData) %in% samplesToKeep, drop = TRUE]

Ploidy

info_table() can also be used to find out about the ploidy of your data. As the results are presented for all samples, here just the first couple lines of output (achieved by using head())

head(info_table(myData, type = "ploidy"), n = 2)
##              Loci
## Samples       INRA63 INRA5 ETH225 HEL5 HEL1 INRA35 ETH152 INRA23 ETH10 HEL9 CSSM66 INRA32 ETH3 BM2113
##   AFBIBOR9503      2     2      2    2    2      2      2      2     2    2      2      2    2      2
##   AFBIBOR9504      2     2      2    2    2      2      2      2     2    2      2      2    2      2
table(info_table(myData, type = "ploidy"), useNA = "always") ## counts different values
## 
##    2 <NA> 
## 1852   10

The numbers make sense:

nLoc(myData) * nInd(myData)
## [1] 1862

How many alleles expect in a tetraploid?

Could this number be 1 for a diploid?


Are loci informative?

Another important consideration is if our loci are variable and informative. Otherwise we’re wasting time and space on keeping them! The function informloci() can be used to detect uninformative loci and remove them.

informloci(myData)
## cutoff value: 1.50375939849624 % ( 2 samples ).
## MAF         : 0.01
## 
## All sites polymorphic
## /// GENIND OBJECT /////////
## 
##  // 133 individuals; 14 loci; 137 alleles; size: 109.2 Kb
## 
##  // Basic content
##    @tab:  133 x 137 matrix of allele counts
##    @loc.n.all: number of alleles per locus (range: 5-14)
##    @loc.fac: locus factor for the 137 columns of @tab
##    @all.names: list of allele names for each locus
##    @ploidy: ploidy of each individual  (range: 2-2)
##    @type:  codom
##    @call: .local(x = x, i = i, j = j, loc = ..1, drop = drop)
## 
##  // Optional content
##    @pop: population of each individual (group size range: 15-30)

In our case, all loci are informative and they are kept. If we had uninformative loci, and wanted to remove them, we would have run the following.

myData <- informloci(myData)

Genetic variation


A very simple exploration

You can easily explore genetic differences between individuals – within each population – using our home made little function:

pairwise_similarity(myData, pop = "Pop1")

We can represent the results visually:

lapply(unique(myData@pop), function(pop) hist(pairwise_similarity(myData, pop = pop, as_vector = TRUE), xlim = c(0, 1))) # don't forget as_vector = TRUE

How would you do the same thing across all populations?


Allele vs genotype frequencies

Consider the following two populations with different genotype frequencies:

             AA    Aa    aa
      Pop1   50     0    50
      Pop2   25    50    25

Do the genotype frequencies of the two populations differ?

Do the allele frequencies of the two populations differ?


The Hardy-Weinberg equilibrium (HWE)

Assumptions

  1. Genotype frequencies are the same in males and females.

  2. Individuals mate at random with respect to their genotype at this particular locus (panmixia).

  3. Meiosis is fair.

  4. There is no new genetic material (no mutation).

  5. There is no gene flow (no migration).

  6. The population is of infinite size (no drift).

  7. All matings produce the same average number of offspring (no selection on fecundity).

  8. There are no differences among genotypes in the probability of survival (no selection on survival).

  9. Generations do not overlap (no selection on reproductive rate).


Freq. in next generation

Using frequency of alleles in the current generation …

\(p_t = f(A)\)

\(q_t = f(a)\)

… the frequencies in the next generation can be calculated:

\(f(AA) = p_t^{2}\)

\(f(Aa) = 2p_tq_t\)

\(f(aa) = q_t^{2}\)

\(p_{t+1} = f(AA) + \dfrac{f(Aa)}{2} = p_t^2 + \dfrac{2p_tq_t}{2} = p_t^2 + p_tq_t = p_t^2 + p_t(1-p_t) = p_t^2 + p_t - p_t^2 = p_t\)

\(q_{t+1} = f(aa) + \dfrac{f(Aa)}{2} = q_t^2 + \dfrac{2p_tq_t}{2} = q_t^2 + p_tq_t = q_t^2 + (1-q_t)q_t = q_t^2 + q_t - q_t^2 = q_t\)

Under HWE, the allelic frequencies remain the same over time!


Why do we care?

Evolution is a change in allele frequencies in a population over time

When a locus in a population is in HWE, it is not evolving: allele frequencies will stay the same across generations.

If HWE assumptions are not met, evolution can happen (allele frequencies may change).

Mutation, non-random mating, gene flow, genetic drift (caused by finite population size), and natural selection violate HWE assumptions and are thus “mechanisms” by which evolution may proceed.

The HWE is thus the null model of micro-evolution.

So is it good/bad for a locus to be in HWE?


“Neutral” evolution

Very broadly speaking, there are two types of population genetics analyses you can do:

  1. those that assume loci are in HWE, and

  2. those that do not.

It is thus important to realise if your data reject the HWE assumptions!


Testing the HWE

With hw.test() from pegas we can test for HWE across our populations:

hw.test(myData, B = 0) ## for Monte Carlo test instead of asymptotic,
##            chi^2 df  Pr(chi^2 >)
## INRA63  88.42099 21 3.017570e-10
## INRA5   42.19441 10 6.924324e-06
## ETH225 124.64277 66 1.745822e-05
## HEL5   180.45114 45 0.000000e+00
## HEL1   184.99600 28 0.000000e+00
## INRA35  60.39321 21 1.113412e-05
## ETH152  47.09923 21 9.106719e-04
## INRA23  77.80636 66 1.516741e-01
## ETH10   70.74219 36 4.795813e-04
## HEL9   130.20509 55 4.867201e-08
## CSSM66  82.65631 66 8.075179e-02
## INRA32 209.96734 66 1.110223e-16
## ETH3    44.98197 55 8.303643e-01
## BM2113 184.20731 91 2.747856e-08
                       ## use large value of B (>= 1000)

Here we have conducted a \(\chi^{2}\)-test based on observed and expected genotype frequencies calculated from the allele frequencies.

Would we expect loci to be in HWE if we consider all populations together as we have done?

The Wahlund effect

The Wahlund effect refers to the reduction in the number of heterozygotes due to population structure.

Consider two populations:

             AA    Aa    aa
      Pop1   50     0     0
      Pop2    0     0    50

Here, each population at the HWE.

However, if we treat them as a single population, there are no heterozygotes, and this merged-population is not in HWE.


Let’s check by population. We will split myData by population using the seppop function. This creates a list of genind objects, with every entry in the list consisting of a population.

To apply the HWE test with hw.test() to every item in the list (ie every population), we use (as above) the function lapply().

lapply(seppop(myData), hw.test)
## $Pop1
##             chi^2 df Pr(chi^2 >) Pr.exact
## INRA63  0.8333333  6  0.99114998    1.000
## INRA5  12.9166667  6  0.04437854    0.006
## ETH225 21.4390432 36  0.97397104    0.870
## HEL5   37.7654321 28  0.10291932    0.237
## HEL1   16.0969388 15  0.37563846    0.404
## INRA35 10.0888889  6  0.12095828    0.229
## ETH152 10.2835732 10  0.41597667    0.105
## INRA23 19.0958270 21  0.57899284    0.207
## ETH10  13.9506173 21  0.87171109    0.814
## HEL9   26.7469136 28  0.53205905    0.747
## CSSM66 29.6798155 15  0.01312988    0.186
## INRA32 48.1087311 36  0.08546174    0.079
## ETH3    3.8776014 10  0.95269969    0.965
## BM2113 18.1530612 21  0.63929854    0.584
## 
## $Pop2
##            chi^2 df  Pr(chi^2 >) Pr.exact
## INRA63 45.692857 21 1.403507e-03    0.000
## INRA5  19.552921 10 3.377642e-02    0.024
## ETH225 54.411111 28 1.999921e-03    0.000
## HEL5   59.525763 21 1.506288e-05    0.000
## HEL1    3.949720 10 9.495867e-01    0.868
## INRA35 13.606771 10 1.916952e-01    0.036
## ETH152 11.280563 10 3.360814e-01    0.063
## INRA23  9.775034 21 9.816874e-01    0.753
## ETH10  13.687899 15 5.493191e-01    0.230
## HEL9   52.909954 45 1.952332e-01    0.183
## CSSM66 27.401167 36 8.478470e-01    0.592
## INRA32 87.022377 36 4.114494e-06    0.120
## ETH3   11.089471 15 7.462265e-01    0.501
## BM2113 19.367498 28 8.864145e-01    0.893
## 
## $Pop3
##              chi^2 df Pr(chi^2 >) Pr.exact
## INRA63  5.94687500 10  0.81970664    0.561
## INRA5   7.60416667  6  0.26856035    0.106
## ETH225  7.54933378  3  0.05630431    0.073
## HEL5    9.13333333 10  0.51949831    0.181
## HEL1    0.60000000  1  0.43857803    1.000
## INRA35  3.30740741  3  0.34661300    0.161
## ETH152  3.62388427  1  0.05695575    0.065
## INRA23  0.80740741  3  0.84769447    1.000
## ETH10   0.80740741  3  0.84769447    1.000
## HEL9    2.76962525  6  0.83715593    0.821
## CSSM66  1.51859504  1  0.21783223    0.236
## INRA32  0.01783591  1  0.89375751    1.000
## ETH3   11.76480716 10  0.30110547    0.349
## BM2113  5.93357821 10  0.82081249    0.886
## 
## $Pop4
##            chi^2 df  Pr(chi^2 >) Pr.exact
## INRA63 37.158133 10 5.313702e-05    0.040
## INRA5   1.228844  3 7.460949e-01    0.496
## ETH225 29.540706 21 1.016135e-01    0.089
## HEL5   32.331967 28 2.612127e-01    0.370
## HEL1   57.190924 21 3.366698e-05    0.000
## INRA35  3.473136  3 3.242630e-01    0.267
## ETH152  2.228763  6 8.975031e-01    0.756
## INRA23 27.404740 21 1.578545e-01    0.081
## ETH10  20.406585 15 1.568841e-01    0.001
## HEL9   23.035714 15 8.338452e-02    0.007
## CSSM66 18.778597 36 9.920040e-01    0.376
## INRA32  7.384172  3 6.061047e-02    0.038
## ETH3   34.353741 10 1.608659e-04    0.041
## BM2113 13.117038 21 9.044692e-01    0.707
## 
## $Pop5
##            chi^2 df Pr(chi^2 >) Pr.exact
## INRA63 13.791312 15   0.5414118    0.529
## INRA5   7.588642  6   0.2698152    0.051
## ETH225 25.176667 21   0.2395936    0.140
## HEL5    4.665958 10   0.9123471    0.923
## HEL1    1.454694  3   0.6927657    1.000
## INRA35  2.296331  1   0.1296800    0.234
## ETH152  1.845937  6   0.9333083    0.854
## INRA23 16.209393 15   0.3682734    0.194
## ETH10  11.739444 15   0.6986344    0.232
## HEL9   40.562233 36   0.2761318    0.352
## CSSM66 21.933673 21   0.4033407    0.247
## INRA32  4.361975 10   0.9295439    0.498
## ETH3   32.563125 28   0.2522038    0.073
## BM2113 12.740575 10   0.2385404    0.141
## 
## $Pop6
##            chi^2 df  Pr(chi^2 >) Pr.exact
## INRA63  0.117551  1 7.317059e-01    1.000
## INRA5   2.346667  3 5.036398e-01    0.653
## ETH225  9.120676 10 5.206906e-01    0.355
## HEL5    8.699421 28 9.998207e-01    0.991
## HEL1   34.147448 10 1.743706e-04    0.107
## INRA35 34.863905  6 4.579186e-06    0.013
## ETH152  3.484444  6 7.460381e-01    0.340
## INRA23 18.391038 15 2.426658e-01    0.095
## ETH10  23.278571 15 7.840172e-02    0.184
## HEL9   30.950400 15 8.920097e-03    0.112
## CSSM66 31.604938 36 6.776680e-01    0.763
## INRA32  3.998186  6 6.769219e-01    0.742
## ETH3    6.064198 21 9.993764e-01    1.000
## BM2113 35.235556 36 5.047583e-01    0.142

Be careful during your interpretation: the population sizes differ and so does the statical power!

unlist(lapply(seppop(myData), nInd))
## Pop1 Pop2 Pop3 Pop4 Pop5 Pop6 
##   20   30   15   30   22   16

Some loci, in some populations, are still not in HWE.

Why may this be?


Linkage Disequilibrium

Another important consideration for analysing genetic data is the independence of loci.

Imagine two loci that are physically very close on the same chromosome. Will alleles at these loci be segregating independently?

Do you know how this is measured?

For this reason, and many others (e.g. Fisher’s runaway), it is possible that alleles at one locus may be associated with alleles at another locus. In other words, we may observe non-random association of alleles at different loci. This is called linkage disequilibrium.

Why is this important? If the dependence between loci is not recognised, the results will be biased.

We can look at this with the ia() function of poppr. This function calculates an index of association over all loci in the genind object:

ia(myData, sample = 999)

##         Ia       p.Ia      rbarD       p.rD 
## 0.84836718 0.00100000 0.06567056 0.00100000

Here we can see the distribution of expected association between alleles at different loci when there is no linkage (dark grey barplot), and the estimate for association among alleles in our total dataset (i.e., all loci and all pops at the same time).

It appears we have a significant association across all populations.

What if we look at the same test per population? To answer this question, we use seppop() and lapply() as before. But now we include ia() and for the argument sample, which defines the number of permutations used to draw the distribution of the association index under the null hypothesis, we want a large number, e.g., 999:

lapply(seppop(myData), ia, sample = 999)

## $Pop1
##         Ia       p.Ia      rbarD       p.rD 
## 0.25067155 0.03900000 0.01941745 0.03900000 
## 
## $Pop2
##          Ia        p.Ia       rbarD        p.rD 
## 0.018941131 0.499000000 0.001466489 0.499000000 
## 
## $Pop3
##         Ia       p.Ia      rbarD       p.rD 
## 0.68851067 0.00100000 0.05471501 0.00100000 
## 
## $Pop4
##           Ia         p.Ia        rbarD         p.rD 
## -0.018113328  0.510000000 -0.001413501  0.510000000 
## 
## $Pop5
##         Ia       p.Ia      rbarD       p.rD 
## 0.46856847 0.00600000 0.03641272 0.00600000 
## 
## $Pop6
##          Ia        p.Ia       rbarD        p.rD 
## -0.29912511  0.97300000 -0.02351494  0.97400000

In some populations we observe significant LD.

Is this important?

If we want to know which two loci are in LD, we can use the pair.ia() function. Let’s consider Pop6 using seppop(myData)[["Pop6"]].

pair.ia(seppop(myData)[["Pop6"]])

##                     Ia    rbarD
## INRA63:INRA5  -2.5e-03 -2.5e-03
## INRA63:ETH225 -5.4e-02 -5.4e-02
## INRA63:HEL5   -2.4e-01 -2.4e-01
## INRA63:HEL1   -1.4e-01 -1.4e-01
## INRA63:INRA35 -9.3e-02 -9.6e-02
## INRA63:ETH152 -4.8e-02 -4.8e-02
## INRA63:INRA23 -1.1e-02 -1.1e-02
## INRA63:ETH10   8.5e-02  8.6e-02
## INRA63:HEL9    5.7e-02  5.8e-02
## INRA63:CSSM66  1.2e-01  1.2e-01
## INRA63:INRA32  1.5e-01  1.5e-01
## INRA63:ETH3   -1.0e-01 -1.1e-01
## INRA63:BM2113 -1.5e-01 -1.5e-01
## INRA5:ETH225  -2.4e-01 -2.4e-01
## INRA5:HEL5    -1.4e-01 -1.5e-01
## INRA5:HEL1    -2.1e-01 -2.1e-01
## INRA5:INRA35   2.0e-02  2.1e-02
## INRA5:ETH152   6.4e-02  6.4e-02
## INRA5:INRA23   1.1e-02  1.1e-02
## INRA5:ETH10   -5.6e-03 -5.6e-03
## INRA5:HEL9    -2.0e-01 -2.0e-01
## INRA5:CSSM66   3.1e-03  3.1e-03
## INRA5:INRA32  -7.0e-02 -7.0e-02
## INRA5:ETH3    -1.6e-01 -1.6e-01
## INRA5:BM2113  -9.9e-02 -9.9e-02
## ETH225:HEL5    8.5e-03  8.6e-03
## ETH225:HEL1    3.0e-01  3.0e-01
## ETH225:INRA35  1.2e-01  1.3e-01
## ETH225:ETH152 -1.7e-01 -1.7e-01
## ETH225:INRA23 -2.3e-02 -2.3e-02
## ETH225:ETH10  -5.2e-02 -5.2e-02
## ETH225:HEL9   -2.2e-02 -2.3e-02
## ETH225:CSSM66 -2.6e-01 -2.6e-01
## ETH225:INRA32 -2.4e-02 -2.4e-02
## ETH225:ETH3    2.3e-01  2.4e-01
## ETH225:BM2113  3.6e-02  3.6e-02
## HEL5:HEL1      1.7e-01  1.8e-01
## HEL5:INRA35   -1.5e-01 -1.7e-01
## HEL5:ETH152    2.8e-02  2.9e-02
## HEL5:INRA23    1.1e-01  1.1e-01
## HEL5:ETH10     1.4e-01  1.4e-01
## HEL5:HEL9      2.2e-01  2.4e-01
## HEL5:CSSM66    5.0e-02  5.0e-02
## HEL5:INRA32   -2.2e-02 -2.2e-02
## HEL5:ETH3      1.7e-02  1.7e-02
## HEL5:BM2113   -8.0e-02 -8.1e-02
## HEL1:INRA35   -2.1e-01 -2.1e-01
## HEL1:ETH152    1.1e-01  1.1e-01
## HEL1:INRA23   -3.3e-16 -3.9e-16
## HEL1:ETH10     1.1e-01  1.2e-01
## HEL1:HEL9      5.5e-02  5.5e-02
## HEL1:CSSM66   -5.2e-02 -5.5e-02
## HEL1:INRA32   -1.6e-02 -1.6e-02
## HEL1:ETH3      4.6e-01  4.9e-01
## HEL1:BM2113   -2.9e-02 -2.9e-02
## INRA35:ETH152 -2.3e-01 -2.3e-01
## INRA35:INRA23  1.1e-01  1.2e-01
## INRA35:ETH10  -2.3e-01 -2.4e-01
## INRA35:HEL9   -2.0e-01 -2.0e-01
## INRA35:CSSM66 -9.2e-02 -9.9e-02
## INRA35:INRA32 -1.8e-01 -2.0e-01
## INRA35:ETH3   -1.5e-01 -1.7e-01
## INRA35:BM2113 -6.7e-02 -7.0e-02
## ETH152:INRA23 -8.2e-02 -8.3e-02
## ETH152:ETH10   1.3e-01  1.3e-01
## ETH152:HEL9   -1.1e-01 -1.1e-01
## ETH152:CSSM66  1.8e-01  1.8e-01
## ETH152:INRA32 -1.2e-01 -1.2e-01
## ETH152:ETH3    1.1e-01  1.2e-01
## ETH152:BM2113  1.6e-01  1.6e-01
## INRA23:ETH10  -1.6e-01 -1.6e-01
## INRA23:HEL9   -1.2e-01 -1.2e-01
## INRA23:CSSM66 -5.1e-02 -5.2e-02
## INRA23:INRA32  2.5e-02  2.5e-02
## INRA23:ETH3    2.8e-02  2.8e-02
## INRA23:BM2113 -1.3e-01 -1.3e-01
## ETH10:HEL9    -2.7e-01 -2.8e-01
## ETH10:CSSM66   4.7e-01  4.8e-01
## ETH10:INRA32  -5.2e-02 -5.2e-02
## ETH10:ETH3     1.7e-01  1.8e-01
## ETH10:BM2113  -1.0e-01 -1.0e-01
## HEL9:CSSM66   -7.1e-02 -7.6e-02
## HEL9:INRA32    4.4e-01  4.7e-01
## HEL9:ETH3     -1.6e-01 -1.7e-01
## HEL9:BM2113   -1.3e-01 -1.3e-01
## CSSM66:INRA32 -1.2e-01 -1.2e-01
## CSSM66:ETH3   -2.2e-02 -2.3e-02
## CSSM66:BM2113 -7.5e-02 -7.6e-02
## INRA32:ETH3   -6.4e-02 -6.5e-02
## INRA32:BM2113 -9.2e-03 -9.3e-03
## ETH3:BM2113   -1.9e-02 -2.0e-02

We can see that LD is not the same for all pairs of loci.

Let us change the plot so that it is easier for colour-blind people to see (e.g. for Dan!). For this we add some additional arguments for colour: low = "black" and high = "green".

pair.ia(myData[pop = "Pop6"], low = "black", high = "green")

##                     Ia    rbarD
## INRA63:INRA5  -2.5e-03 -2.5e-03
## INRA63:ETH225 -5.4e-02 -5.4e-02
## INRA63:HEL5   -2.4e-01 -2.4e-01
## INRA63:HEL1   -1.4e-01 -1.4e-01
## INRA63:INRA35 -9.3e-02 -9.6e-02
## INRA63:ETH152 -4.8e-02 -4.8e-02
## INRA63:INRA23 -1.1e-02 -1.1e-02
## INRA63:ETH10   8.5e-02  8.6e-02
## INRA63:HEL9    5.7e-02  5.8e-02
## INRA63:CSSM66  1.2e-01  1.2e-01
## INRA63:INRA32  1.5e-01  1.5e-01
## INRA63:ETH3   -1.0e-01 -1.1e-01
## INRA63:BM2113 -1.5e-01 -1.5e-01
## INRA5:ETH225  -2.4e-01 -2.4e-01
## INRA5:HEL5    -1.4e-01 -1.5e-01
## INRA5:HEL1    -2.1e-01 -2.1e-01
## INRA5:INRA35   2.0e-02  2.1e-02
## INRA5:ETH152   6.4e-02  6.4e-02
## INRA5:INRA23   1.1e-02  1.1e-02
## INRA5:ETH10   -5.6e-03 -5.6e-03
## INRA5:HEL9    -2.0e-01 -2.0e-01
## INRA5:CSSM66   3.1e-03  3.1e-03
## INRA5:INRA32  -7.0e-02 -7.0e-02
## INRA5:ETH3    -1.6e-01 -1.6e-01
## INRA5:BM2113  -9.9e-02 -9.9e-02
## ETH225:HEL5    8.5e-03  8.6e-03
## ETH225:HEL1    3.0e-01  3.0e-01
## ETH225:INRA35  1.2e-01  1.3e-01
## ETH225:ETH152 -1.7e-01 -1.7e-01
## ETH225:INRA23 -2.3e-02 -2.3e-02
## ETH225:ETH10  -5.2e-02 -5.2e-02
## ETH225:HEL9   -2.2e-02 -2.3e-02
## ETH225:CSSM66 -2.6e-01 -2.6e-01
## ETH225:INRA32 -2.4e-02 -2.4e-02
## ETH225:ETH3    2.3e-01  2.4e-01
## ETH225:BM2113  3.6e-02  3.6e-02
## HEL5:HEL1      1.7e-01  1.8e-01
## HEL5:INRA35   -1.5e-01 -1.7e-01
## HEL5:ETH152    2.8e-02  2.9e-02
## HEL5:INRA23    1.1e-01  1.1e-01
## HEL5:ETH10     1.4e-01  1.4e-01
## HEL5:HEL9      2.2e-01  2.4e-01
## HEL5:CSSM66    5.0e-02  5.0e-02
## HEL5:INRA32   -2.2e-02 -2.2e-02
## HEL5:ETH3      1.7e-02  1.7e-02
## HEL5:BM2113   -8.0e-02 -8.1e-02
## HEL1:INRA35   -2.1e-01 -2.1e-01
## HEL1:ETH152    1.1e-01  1.1e-01
## HEL1:INRA23   -3.3e-16 -3.9e-16
## HEL1:ETH10     1.1e-01  1.2e-01
## HEL1:HEL9      5.5e-02  5.5e-02
## HEL1:CSSM66   -5.2e-02 -5.5e-02
## HEL1:INRA32   -1.6e-02 -1.6e-02
## HEL1:ETH3      4.6e-01  4.9e-01
## HEL1:BM2113   -2.9e-02 -2.9e-02
## INRA35:ETH152 -2.3e-01 -2.3e-01
## INRA35:INRA23  1.1e-01  1.2e-01
## INRA35:ETH10  -2.3e-01 -2.4e-01
## INRA35:HEL9   -2.0e-01 -2.0e-01
## INRA35:CSSM66 -9.2e-02 -9.9e-02
## INRA35:INRA32 -1.8e-01 -2.0e-01
## INRA35:ETH3   -1.5e-01 -1.7e-01
## INRA35:BM2113 -6.7e-02 -7.0e-02
## ETH152:INRA23 -8.2e-02 -8.3e-02
## ETH152:ETH10   1.3e-01  1.3e-01
## ETH152:HEL9   -1.1e-01 -1.1e-01
## ETH152:CSSM66  1.8e-01  1.8e-01
## ETH152:INRA32 -1.2e-01 -1.2e-01
## ETH152:ETH3    1.1e-01  1.2e-01
## ETH152:BM2113  1.6e-01  1.6e-01
## INRA23:ETH10  -1.6e-01 -1.6e-01
## INRA23:HEL9   -1.2e-01 -1.2e-01
## INRA23:CSSM66 -5.1e-02 -5.2e-02
## INRA23:INRA32  2.5e-02  2.5e-02
## INRA23:ETH3    2.8e-02  2.8e-02
## INRA23:BM2113 -1.3e-01 -1.3e-01
## ETH10:HEL9    -2.7e-01 -2.8e-01
## ETH10:CSSM66   4.7e-01  4.8e-01
## ETH10:INRA32  -5.2e-02 -5.2e-02
## ETH10:ETH3     1.7e-01  1.8e-01
## ETH10:BM2113  -1.0e-01 -1.0e-01
## HEL9:CSSM66   -7.1e-02 -7.6e-02
## HEL9:INRA32    4.4e-01  4.7e-01
## HEL9:ETH3     -1.6e-01 -1.7e-01
## HEL9:BM2113   -1.3e-01 -1.3e-01
## CSSM66:INRA32 -1.2e-01 -1.2e-01
## CSSM66:ETH3   -2.2e-02 -2.3e-02
## CSSM66:BM2113 -7.5e-02 -7.6e-02
## INRA32:ETH3   -6.4e-02 -6.5e-02
## INRA32:BM2113 -9.2e-03 -9.3e-03
## ETH3:BM2113   -1.9e-02 -2.0e-02
lapply(seppop(myData), pair.ia, low = "black", high = "green")

## $Pop1
##                    Ia   rbarD
## INRA63:INRA5  -0.1304 -0.1337
## INRA63:ETH225 -0.0460 -0.0463
## INRA63:HEL5    0.1069  0.1070
## INRA63:HEL1   -0.0203 -0.0203
## INRA63:INRA35 -0.0205 -0.0205
## INRA63:ETH152  0.1525  0.1533
## INRA63:INRA23  0.1233  0.1233
## INRA63:ETH10   0.0184  0.0184
## INRA63:HEL9    0.0032  0.0032
## INRA63:CSSM66  0.0682  0.0682
## INRA63:INRA32 -0.0119 -0.0119
## INRA63:ETH3   -0.0217 -0.0217
## INRA63:BM2113 -0.0663 -0.0663
## INRA5:ETH225  -0.1969 -0.2072
## INRA5:HEL5     0.0314  0.0326
## INRA5:HEL1     0.0941  0.0961
## INRA5:INRA35   0.3094  0.3164
## INRA5:ETH152  -0.0499 -0.0502
## INRA5:INRA23   0.1522  0.1555
## INRA5:ETH10    0.0907  0.0943
## INRA5:HEL9     0.0399  0.0413
## INRA5:CSSM66  -0.0181 -0.0185
## INRA5:INRA32   0.1526  0.1567
## INRA5:ETH3     0.0671  0.0694
## INRA5:BM2113   0.0251  0.0258
## ETH225:HEL5   -0.0743 -0.0743
## ETH225:HEL1    0.0208  0.0209
## ETH225:INRA35 -0.1170 -0.1177
## ETH225:ETH152 -0.0151 -0.0154
## ETH225:INRA23 -0.0838 -0.0844
## ETH225:ETH10   0.0216  0.0216
## ETH225:HEL9    0.0216  0.0217
## ETH225:CSSM66  0.0063  0.0064
## ETH225:INRA32 -0.1336 -0.1341
## ETH225:ETH3   -0.0256 -0.0257
## ETH225:BM2113 -0.0343 -0.0345
## HEL5:HEL1     -0.1259 -0.1262
## HEL5:INRA35    0.1188  0.1190
## HEL5:ETH152   -0.1378 -0.1395
## HEL5:INRA23    0.0831  0.0833
## HEL5:ETH10     0.0509  0.0509
## HEL5:HEL9      0.0518  0.0518
## HEL5:CSSM66    0.2606  0.2611
## HEL5:INRA32   -0.0229 -0.0229
## HEL5:ETH3      0.0572  0.0572
## HEL5:BM2113   -0.1012 -0.1012
## HEL1:INRA35    0.3598  0.3598
## HEL1:ETH152    0.1222  0.1227
## HEL1:INRA23   -0.0141 -0.0141
## HEL1:ETH10    -0.0446 -0.0447
## HEL1:HEL9      0.0633  0.0633
## HEL1:CSSM66   -0.1475 -0.1475
## HEL1:INRA32    0.0061  0.0061
## HEL1:ETH3     -0.1091 -0.1093
## HEL1:BM2113    0.1816  0.1817
## INRA35:ETH152  0.0256  0.0257
## INRA35:INRA23  0.0773  0.0773
## INRA35:ETH10   0.0538  0.0539
## INRA35:HEL9   -0.0392 -0.0393
## INRA35:CSSM66  0.0067  0.0067
## INRA35:INRA32  0.0149  0.0149
## INRA35:ETH3   -0.0092 -0.0092
## INRA35:BM2113  0.0263  0.0264
## ETH152:INRA23  0.0833  0.0837
## ETH152:ETH10   0.0396  0.0401
## ETH152:HEL9    0.0556  0.0562
## ETH152:CSSM66 -0.0181 -0.0182
## ETH152:INRA32  0.0207  0.0208
## ETH152:ETH3   -0.0746 -0.0754
## ETH152:BM2113 -0.0916 -0.0923
## INRA23:ETH10  -0.0376 -0.0376
## INRA23:HEL9    0.0227  0.0227
## INRA23:CSSM66 -0.0761 -0.0761
## INRA23:INRA32  0.0655  0.0655
## INRA23:ETH3    0.0031  0.0031
## INRA23:BM2113  0.0244  0.0244
## ETH10:HEL9     0.0132  0.0132
## ETH10:CSSM66   0.1741  0.1744
## ETH10:INRA32  -0.0419 -0.0419
## ETH10:ETH3    -0.0262 -0.0262
## ETH10:BM2113  -0.0141 -0.0141
## HEL9:CSSM66    0.0713  0.0713
## HEL9:INRA32    0.0199  0.0199
## HEL9:ETH3     -0.0273 -0.0273
## HEL9:BM2113    0.0613  0.0613
## CSSM66:INRA32 -0.0051 -0.0051
## CSSM66:ETH3   -0.0200 -0.0200
## CSSM66:BM2113 -0.1347 -0.1347
## INRA32:ETH3    0.0398  0.0399
## INRA32:BM2113  0.0047  0.0047
## ETH3:BM2113    0.2078  0.2079
## 
## $Pop2
##                     Ia    rbarD
## INRA63:INRA5   0.00240  0.00241
## INRA63:ETH225  0.09044  0.09047
## INRA63:HEL5    0.09870  0.09897
## INRA63:HEL1   -0.01827 -0.01849
## INRA63:INRA35 -0.09862 -0.09866
## INRA63:ETH152  0.10850  0.10857
## INRA63:INRA23 -0.10702 -0.10712
## INRA63:ETH10  -0.02601 -0.02623
## INRA63:HEL9    0.08028  0.08121
## INRA63:CSSM66  0.03369  0.03425
## INRA63:INRA32  0.11670  0.11671
## INRA63:ETH3   -0.07811 -0.07844
## INRA63:BM2113  0.03009  0.03070
## INRA5:ETH225  -0.04664 -0.04666
## INRA5:HEL5    -0.00561 -0.00566
## INRA5:HEL1     0.05620  0.05646
## INRA5:INRA35  -0.10203 -0.10207
## INRA5:ETH152  -0.08975 -0.08977
## INRA5:INRA23  -0.01836 -0.01836
## INRA5:ETH10    0.06641  0.06660
## INRA5:HEL9     0.03607  0.03624
## INRA5:CSSM66   0.00287  0.00289
## INRA5:INRA32   0.08753  0.08770
## INRA5:ETH3    -0.07607 -0.07612
## INRA5:BM2113   0.16316  0.16486
## ETH225:HEL5    0.03426  0.03444
## ETH225:HEL1    0.12889  0.12991
## ETH225:INRA35 -0.04145 -0.04145
## ETH225:ETH152  0.13305  0.13305
## ETH225:INRA23 -0.28862 -0.28866
## ETH225:ETH10   0.01393  0.01400
## ETH225:HEL9   -0.02948 -0.02970
## ETH225:CSSM66  0.03368  0.03409
## ETH225:INRA32  0.01823  0.01824
## ETH225:ETH3   -0.05315 -0.05326
## ETH225:BM2113 -0.06996 -0.07101
## HEL5:HEL1     -0.01301 -0.01335
## HEL5:INRA35    0.03329  0.03347
## HEL5:ETH152    0.05271  0.05304
## HEL5:INRA23   -0.02632 -0.02651
## HEL5:ETH10    -0.05015 -0.05120
## HEL5:HEL9      0.02768  0.02839
## HEL5:CSSM66    0.16015  0.16550
## HEL5:INRA32   -0.09030 -0.09050
## HEL5:ETH3     -0.10342 -0.10486
## HEL5:BM2113    0.07527  0.07812
## HEL1:INRA35   -0.03216 -0.03242
## HEL1:ETH152   -0.03503 -0.03526
## HEL1:INRA23   -0.33821 -0.34025
## HEL1:ETH10     0.01934  0.01935
## HEL1:HEL9     -0.15429 -0.15429
## HEL1:CSSM66   -0.02219 -0.02221
## HEL1:INRA32   -0.24010 -0.24322
## HEL1:ETH3      0.08153  0.08168
## HEL1:BM2113    0.01823  0.01825
## INRA35:ETH152 -0.02894 -0.02894
## INRA35:INRA23  0.08935  0.08936
## INRA35:ETH10  -0.02801 -0.02816
## INRA35:HEL9   -0.07373 -0.07430
## INRA35:CSSM66  0.01891  0.01914
## INRA35:INRA32  0.01211  0.01212
## INRA35:ETH3    0.09723  0.09743
## INRA35:BM2113 -0.09759 -0.09906
## ETH152:INRA23  0.05224  0.05224
## ETH152:ETH10   0.00915  0.00919
## ETH152:HEL9   -0.00850 -0.00855
## ETH152:CSSM66 -0.05437 -0.05495
## ETH152:INRA32  0.01172  0.01173
## ETH152:ETH3    0.10525  0.10541
## ETH152:BM2113 -0.01431 -0.01450
## INRA23:ETH10   0.00062  0.00062
## INRA23:HEL9    0.10540  0.10602
## INRA23:CSSM66  0.02845  0.02873
## INRA23:INRA32  0.16911  0.16933
## INRA23:ETH3    0.02650  0.02653
## INRA23:BM2113 -0.01882 -0.01905
## ETH10:HEL9    -0.10798 -0.10800
## ETH10:CSSM66  -0.04415 -0.04422
## ETH10:INRA32   0.15846  0.15997
## ETH10:ETH3     0.04830  0.04833
## ETH10:BM2113   0.12980  0.13012
## HEL9:CSSM66    0.01308  0.01308
## HEL9:INRA32    0.05656  0.05728
## HEL9:ETH3     -0.06605 -0.06616
## HEL9:BM2113   -0.01292 -0.01294
## CSSM66:INRA32 -0.09074 -0.09241
## CSSM66:ETH3   -0.03896 -0.03913
## CSSM66:BM2113  0.04368  0.04369
## INRA32:ETH3    0.03892  0.03911
## INRA32:BM2113  0.02722  0.02781
## ETH3:BM2113   -0.03823 -0.03846
## 
## $Pop3
##                    Ia   rbarD
## INRA63:INRA5   0.0037  0.0037
## INRA63:ETH225  0.1347  0.1348
## INRA63:HEL5    0.2255  0.2259
## INRA63:HEL1    0.0166  0.0171
## INRA63:INRA35  0.0615  0.0622
## INRA63:ETH152  0.2657  0.2724
## INRA63:INRA23  0.1710  0.1719
## INRA63:ETH10  -0.0612 -0.0615
## INRA63:HEL9   -0.0125 -0.0125
## INRA63:CSSM66  0.0146  0.0148
## INRA63:INRA32 -0.0805 -0.0969
## INRA63:ETH3    0.0361  0.0361
## INRA63:BM2113 -0.0574 -0.0578
## INRA5:ETH225   0.2735  0.2756
## INRA5:HEL5    -0.0665 -0.0666
## INRA5:HEL1    -0.0982 -0.1038
## INRA5:INRA35   0.1043  0.1045
## INRA5:ETH152  -0.0708 -0.0713
## INRA5:INRA23  -0.0878 -0.0894
## INRA5:ETH10   -0.0646 -0.0658
## INRA5:HEL9     0.0474  0.0478
## INRA5:CSSM66  -0.0957 -0.0958
## INRA5:INRA32   0.2183  0.2777
## INRA5:ETH3    -0.1280 -0.1290
## INRA5:BM2113   0.1855  0.1894
## ETH225:HEL5    0.0163  0.0164
## ETH225:HEL1    0.0760  0.0778
## ETH225:INRA35  0.1175  0.1195
## ETH225:ETH152  0.0268  0.0276
## ETH225:INRA23  0.0393  0.0394
## ETH225:ETH10   0.0393  0.0394
## ETH225:HEL9   -0.0476 -0.0476
## ETH225:CSSM66  0.1520  0.1539
## ETH225:INRA32  0.2930  0.3472
## ETH225:ETH3    0.2273  0.2273
## ETH225:BM2113  0.2332  0.2340
## HEL5:HEL1     -0.0349 -0.0365
## HEL5:INRA35    0.0163  0.0164
## HEL5:ETH152    0.0150  0.0153
## HEL5:INRA23    0.1525  0.1543
## HEL5:ETH10    -0.0901 -0.0912
## HEL5:HEL9      0.1342  0.1347
## HEL5:CSSM66   -0.0362 -0.0363
## HEL5:INRA32   -0.0336 -0.0417
## HEL5:ETH3      0.1113  0.1117
## HEL5:BM2113    0.0245  0.0249
## HEL1:INRA35   -0.0215 -0.0232
## HEL1:ETH152    0.0955  0.1061
## HEL1:INRA23    0.0918  0.0928
## HEL1:ETH10     0.1245  0.1258
## HEL1:HEL9      0.1442  0.1474
## HEL1:CSSM66    0.2192  0.2345
## HEL1:INRA32    0.1740  0.1870
## HEL1:ETH3      0.0534  0.0545
## HEL1:BM2113    0.1437  0.1449
## INRA35:ETH152 -0.0335 -0.0336
## INRA35:INRA23  0.0023  0.0023
## INRA35:ETH10   0.0884  0.0912
## INRA35:HEL9    0.3394  0.3454
## INRA35:CSSM66 -0.0186 -0.0186
## INRA35:INRA32  0.1562  0.2063
## INRA35:ETH3    0.0938  0.0954
## INRA35:BM2113  0.0754  0.0780
## ETH152:INRA23  0.1255  0.1321
## ETH152:ETH10  -0.0518 -0.0545
## ETH152:HEL9    0.0097  0.0100
## ETH152:CSSM66  0.0212  0.0212
## ETH152:INRA32 -0.0458 -0.0634
## ETH152:ETH3    0.0097  0.0100
## ETH152:BM2113 -0.0358 -0.0378
## INRA23:ETH10  -0.1717 -0.1717
## INRA23:HEL9    0.0010  0.0010
## INRA23:CSSM66  0.1798  0.1845
## INRA23:INRA32 -0.0057 -0.0065
## INRA23:ETH3    0.0812  0.0814
## INRA23:BM2113 -0.1196 -0.1196
## ETH10:HEL9     0.1079  0.1081
## ETH10:CSSM66   0.2020  0.2072
## ETH10:INRA32  -0.0905 -0.1035
## ETH10:ETH3    -0.0524 -0.0525
## ETH10:BM2113   0.0251  0.0251
## HEL9:CSSM66    0.0552  0.0559
## HEL9:INRA32    0.0205  0.0243
## HEL9:ETH3      0.1969  0.1969
## HEL9:BM2113    0.0415  0.0416
## CSSM66:INRA32 -0.0794 -0.1032
## CSSM66:ETH3    0.1606  0.1627
## CSSM66:BM2113 -0.0150 -0.0154
## INRA32:ETH3    0.1359  0.1607
## INRA32:BM2113  0.2219  0.2522
## ETH3:BM2113   -0.0126 -0.0127
## 
## $Pop4
##                    Ia   rbarD
## INRA63:INRA5   0.0189  0.0193
## INRA63:ETH225  0.2322  0.2355
## INRA63:HEL5   -0.0103 -0.0105
## INRA63:HEL1   -0.0974 -0.0976
## INRA63:INRA35 -0.0395 -0.0406
## INRA63:ETH152  0.2571  0.2644
## INRA63:INRA23 -0.0532 -0.0533
## INRA63:ETH10  -0.0815 -0.0819
## INRA63:HEL9    0.0645  0.0646
## INRA63:CSSM66  0.0268  0.0272
## INRA63:INRA32  0.0590  0.0590
## INRA63:ETH3   -0.1206 -0.1239
## INRA63:BM2113  0.0758  0.0775
## INRA5:ETH225  -0.0587 -0.0588
## INRA5:HEL5     0.1712  0.1713
## INRA5:HEL1     0.0672  0.0694
## INRA5:INRA35  -0.0071 -0.0071
## INRA5:ETH152   0.0622  0.0622
## INRA5:INRA23   0.0609  0.0616
## INRA5:ETH10   -0.0352 -0.0368
## INRA5:HEL9    -0.0529 -0.0534
## INRA5:CSSM66   0.0601  0.0602
## INRA5:INRA32  -0.0877 -0.0901
## INRA5:ETH3    -0.0537 -0.0537
## INRA5:BM2113   0.0067  0.0067
## ETH225:HEL5   -0.1089 -0.1089
## ETH225:HEL1    0.0468  0.0479
## ETH225:INRA35 -0.0465 -0.0466
## ETH225:ETH152 -0.0421 -0.0422
## ETH225:INRA23  0.1063  0.1070
## ETH225:ETH10   0.0405  0.0419
## ETH225:HEL9   -0.0128 -0.0129
## ETH225:CSSM66 -0.0169 -0.0169
## ETH225:INRA32  0.0823  0.0839
## ETH225:ETH3    0.1292  0.1295
## ETH225:BM2113  0.0455  0.0456
## HEL5:HEL1      0.1188  0.1217
## HEL5:INRA35   -0.1739 -0.1742
## HEL5:ETH152    0.0103  0.0103
## HEL5:INRA23    0.0965  0.0972
## HEL5:ETH10    -0.0408 -0.0422
## HEL5:HEL9     -0.0837 -0.0842
## HEL5:CSSM66    0.0769  0.0769
## HEL5:INRA32    0.0423  0.0431
## HEL5:ETH3      0.0135  0.0135
## HEL5:BM2113   -0.0752 -0.0753
## HEL1:INRA35    0.0277  0.0288
## HEL1:ETH152   -0.0211 -0.0220
## HEL1:INRA23   -0.0687 -0.0690
## HEL1:ETH10     0.0070  0.0070
## HEL1:HEL9     -0.0885 -0.0891
## HEL1:CSSM66    0.0443  0.0453
## HEL1:INRA32    0.1087  0.1087
## HEL1:ETH3      0.0382  0.0397
## HEL1:BM2113   -0.0175 -0.0181
## INRA35:ETH152  0.2385  0.2385
## INRA35:INRA23 -0.1849 -0.1879
## INRA35:ETH10   0.0847  0.0892
## INRA35:HEL9    0.0486  0.0493
## INRA35:CSSM66 -0.0013 -0.0013
## INRA35:INRA32 -0.1815 -0.1876
## INRA35:ETH3   -0.0759 -0.0759
## INRA35:BM2113 -0.0816 -0.0816
## ETH152:INRA23 -0.0750 -0.0763
## ETH152:ETH10   0.0019  0.0020
## ETH152:HEL9    0.0702  0.0713
## ETH152:CSSM66 -0.1483 -0.1487
## ETH152:INRA32  0.0497  0.0515
## ETH152:ETH3   -0.1083 -0.1083
## ETH152:BM2113 -0.0710 -0.0710
## INRA23:ETH10  -0.1717 -0.1735
## INRA23:HEL9   -0.1250 -0.1250
## INRA23:CSSM66 -0.1071 -0.1078
## INRA23:INRA32  0.1609  0.1614
## INRA23:ETH3   -0.0440 -0.0447
## INRA23:BM2113  0.0526  0.0533
## ETH10:HEL9     0.0892  0.0902
## ETH10:CSSM66  -0.0588 -0.0608
## ETH10:INRA32  -0.0321 -0.0322
## ETH10:ETH3     0.0824  0.0868
## ETH10:BM2113   0.0140  0.0146
## HEL9:CSSM66    0.0828  0.0833
## HEL9:INRA32   -0.0896 -0.0899
## HEL9:ETH3     -0.0259 -0.0263
## HEL9:BM2113    0.0618  0.0625
## CSSM66:INRA32 -0.0672 -0.0685
## CSSM66:ETH3   -0.0283 -0.0283
## CSSM66:BM2113  0.0338  0.0339
## INRA32:ETH3   -0.0743 -0.0768
## INRA32:BM2113 -0.0879 -0.0904
## ETH3:BM2113   -0.1078 -0.1078
## 
## $Pop5
##                    Ia   rbarD
## INRA63:INRA5  -0.1076 -0.1100
## INRA63:ETH225  0.1188  0.1203
## INRA63:HEL5    0.1641  0.1641
## INRA63:HEL1    0.0354  0.0363
## INRA63:INRA35  0.1662  0.1662
## INRA63:ETH152  0.0955  0.0955
## INRA63:INRA23  0.0663  0.0665
## INRA63:ETH10  -0.0278 -0.0280
## INRA63:HEL9   -0.0484 -0.0484
## INRA63:CSSM66 -0.1827 -0.1831
## INRA63:INRA32 -0.0815 -0.0819
## INRA63:ETH3   -0.0355 -0.0359
## INRA63:BM2113 -0.0633 -0.0633
## INRA5:ETH225   0.0697  0.0697
## INRA5:HEL5    -0.0767 -0.0790
## INRA5:HEL1    -0.1095 -0.1198
## INRA5:INRA35  -0.0068 -0.0070
## INRA5:ETH152  -0.0085 -0.0087
## INRA5:INRA23   0.1922  0.1941
## INRA5:ETH10   -0.0948 -0.0952
## INRA5:HEL9     0.3757  0.3829
## INRA5:CSSM66  -0.0656 -0.0662
## INRA5:INRA32   0.0078  0.0079
## INRA5:ETH3    -0.0051 -0.0051
## INRA5:BM2113  -0.0731 -0.0747
## ETH225:HEL5    0.1444  0.1472
## ETH225:HEL1   -0.1225 -0.1315
## ETH225:INRA35  0.0855  0.0863
## ETH225:ETH152  0.1508  0.1521
## ETH225:INRA23  0.2299  0.2309
## ETH225:ETH10   0.0943  0.0944
## ETH225:HEL9    0.0992  0.1003
## ETH225:CSSM66  0.0231  0.0232
## ETH225:INRA32 -0.0310 -0.0311
## ETH225:ETH3   -0.1444 -0.1444
## ETH225:BM2113 -0.0379 -0.0384
## HEL5:HEL1     -0.0095 -0.0097
## HEL5:INRA35    0.2069  0.2073
## HEL5:ETH152    0.0426  0.0426
## HEL5:INRA23    0.0382  0.0384
## HEL5:ETH10    -0.0398 -0.0403
## HEL5:HEL9     -0.0489 -0.0490
## HEL5:CSSM66    0.0299  0.0300
## HEL5:INRA32   -0.0417 -0.0420
## HEL5:ETH3     -0.1613 -0.1642
## HEL5:BM2113   -0.0199 -0.0199
## HEL1:INRA35    0.1201  0.1238
## HEL1:ETH152   -0.0667 -0.0688
## HEL1:INRA23   -0.0567 -0.0591
## HEL1:ETH10     0.0589  0.0625
## HEL1:HEL9      0.0155  0.0159
## HEL1:CSSM66   -0.0323 -0.0337
## HEL1:INRA32    0.0654  0.0687
## HEL1:ETH3      0.2538  0.2719
## HEL1:BM2113    0.2219  0.2276
## INRA35:ETH152 -0.0344 -0.0344
## INRA35:INRA23 -0.0560 -0.0561
## INRA35:ETH10  -0.0194 -0.0195
## INRA35:HEL9    0.3655  0.3655
## INRA35:CSSM66 -0.0138 -0.0138
## INRA35:INRA32  0.4635  0.4646
## INRA35:ETH3    0.0199  0.0201
## INRA35:BM2113 -0.0652 -0.0652
## ETH152:INRA23  0.0865  0.0866
## ETH152:ETH10   0.1600  0.1608
## ETH152:HEL9    0.0713  0.0713
## ETH152:CSSM66 -0.0297 -0.0298
## ETH152:INRA32  0.0365  0.0366
## ETH152:ETH3   -0.1134 -0.1143
## ETH152:BM2113  0.2145  0.2146
## INRA23:ETH10   0.2513  0.2517
## INRA23:HEL9    0.0974  0.0975
## INRA23:CSSM66  0.0625  0.0625
## INRA23:INRA32  0.0563  0.0564
## INRA23:ETH3   -0.0050 -0.0050
## INRA23:BM2113 -0.1738 -0.1742
## ETH10:HEL9    -0.0892 -0.0898
## ETH10:CSSM66  -0.0393 -0.0394
## ETH10:INRA32   0.2136  0.2137
## ETH10:ETH3    -0.0103 -0.0103
## ETH10:BM2113   0.0375  0.0378
## HEL9:CSSM66    0.0451  0.0452
## HEL9:INRA32    0.1628  0.1634
## HEL9:ETH3     -0.0437 -0.0442
## HEL9:BM2113   -0.0390 -0.0390
## CSSM66:INRA32  0.0835  0.0836
## CSSM66:ETH3    0.0661  0.0663
## CSSM66:BM2113 -0.0339 -0.0340
## INRA32:ETH3    0.1202  0.1204
## INRA32:BM2113 -0.0166 -0.0167
## ETH3:BM2113    0.1059  0.1070
## 
## $Pop6
##                     Ia    rbarD
## INRA63:INRA5  -2.5e-03 -2.5e-03
## INRA63:ETH225 -5.4e-02 -5.4e-02
## INRA63:HEL5   -2.4e-01 -2.4e-01
## INRA63:HEL1   -1.4e-01 -1.4e-01
## INRA63:INRA35 -9.3e-02 -9.6e-02
## INRA63:ETH152 -4.8e-02 -4.8e-02
## INRA63:INRA23 -1.1e-02 -1.1e-02
## INRA63:ETH10   8.5e-02  8.6e-02
## INRA63:HEL9    5.7e-02  5.8e-02
## INRA63:CSSM66  1.2e-01  1.2e-01
## INRA63:INRA32  1.5e-01  1.5e-01
## INRA63:ETH3   -1.0e-01 -1.1e-01
## INRA63:BM2113 -1.5e-01 -1.5e-01
## INRA5:ETH225  -2.4e-01 -2.4e-01
## INRA5:HEL5    -1.4e-01 -1.5e-01
## INRA5:HEL1    -2.1e-01 -2.1e-01
## INRA5:INRA35   2.0e-02  2.1e-02
## INRA5:ETH152   6.4e-02  6.4e-02
## INRA5:INRA23   1.1e-02  1.1e-02
## INRA5:ETH10   -5.6e-03 -5.6e-03
## INRA5:HEL9    -2.0e-01 -2.0e-01
## INRA5:CSSM66   3.1e-03  3.1e-03
## INRA5:INRA32  -7.0e-02 -7.0e-02
## INRA5:ETH3    -1.6e-01 -1.6e-01
## INRA5:BM2113  -9.9e-02 -9.9e-02
## ETH225:HEL5    8.5e-03  8.6e-03
## ETH225:HEL1    3.0e-01  3.0e-01
## ETH225:INRA35  1.2e-01  1.3e-01
## ETH225:ETH152 -1.7e-01 -1.7e-01
## ETH225:INRA23 -2.3e-02 -2.3e-02
## ETH225:ETH10  -5.2e-02 -5.2e-02
## ETH225:HEL9   -2.2e-02 -2.3e-02
## ETH225:CSSM66 -2.6e-01 -2.6e-01
## ETH225:INRA32 -2.4e-02 -2.4e-02
## ETH225:ETH3    2.3e-01  2.4e-01
## ETH225:BM2113  3.6e-02  3.6e-02
## HEL5:HEL1      1.7e-01  1.8e-01
## HEL5:INRA35   -1.5e-01 -1.7e-01
## HEL5:ETH152    2.8e-02  2.9e-02
## HEL5:INRA23    1.1e-01  1.1e-01
## HEL5:ETH10     1.4e-01  1.4e-01
## HEL5:HEL9      2.2e-01  2.4e-01
## HEL5:CSSM66    5.0e-02  5.0e-02
## HEL5:INRA32   -2.2e-02 -2.2e-02
## HEL5:ETH3      1.7e-02  1.7e-02
## HEL5:BM2113   -8.0e-02 -8.1e-02
## HEL1:INRA35   -2.1e-01 -2.1e-01
## HEL1:ETH152    1.1e-01  1.1e-01
## HEL1:INRA23   -3.3e-16 -3.9e-16
## HEL1:ETH10     1.1e-01  1.2e-01
## HEL1:HEL9      5.5e-02  5.5e-02
## HEL1:CSSM66   -5.2e-02 -5.5e-02
## HEL1:INRA32   -1.6e-02 -1.6e-02
## HEL1:ETH3      4.6e-01  4.9e-01
## HEL1:BM2113   -2.9e-02 -2.9e-02
## INRA35:ETH152 -2.3e-01 -2.3e-01
## INRA35:INRA23  1.1e-01  1.2e-01
## INRA35:ETH10  -2.3e-01 -2.4e-01
## INRA35:HEL9   -2.0e-01 -2.0e-01
## INRA35:CSSM66 -9.2e-02 -9.9e-02
## INRA35:INRA32 -1.8e-01 -2.0e-01
## INRA35:ETH3   -1.5e-01 -1.7e-01
## INRA35:BM2113 -6.7e-02 -7.0e-02
## ETH152:INRA23 -8.2e-02 -8.3e-02
## ETH152:ETH10   1.3e-01  1.3e-01
## ETH152:HEL9   -1.1e-01 -1.1e-01
## ETH152:CSSM66  1.8e-01  1.8e-01
## ETH152:INRA32 -1.2e-01 -1.2e-01
## ETH152:ETH3    1.1e-01  1.2e-01
## ETH152:BM2113  1.6e-01  1.6e-01
## INRA23:ETH10  -1.6e-01 -1.6e-01
## INRA23:HEL9   -1.2e-01 -1.2e-01
## INRA23:CSSM66 -5.1e-02 -5.2e-02
## INRA23:INRA32  2.5e-02  2.5e-02
## INRA23:ETH3    2.8e-02  2.8e-02
## INRA23:BM2113 -1.3e-01 -1.3e-01
## ETH10:HEL9    -2.7e-01 -2.8e-01
## ETH10:CSSM66   4.7e-01  4.8e-01
## ETH10:INRA32  -5.2e-02 -5.2e-02
## ETH10:ETH3     1.7e-01  1.8e-01
## ETH10:BM2113  -1.0e-01 -1.0e-01
## HEL9:CSSM66   -7.1e-02 -7.6e-02
## HEL9:INRA32    4.4e-01  4.7e-01
## HEL9:ETH3     -1.6e-01 -1.7e-01
## HEL9:BM2113   -1.3e-01 -1.3e-01
## CSSM66:INRA32 -1.2e-01 -1.2e-01
## CSSM66:ETH3   -2.2e-02 -2.3e-02
## CSSM66:BM2113 -7.5e-02 -7.6e-02
## INRA32:ETH3   -6.4e-02 -6.5e-02
## INRA32:BM2113 -9.2e-03 -9.3e-03
## ETH3:BM2113   -1.9e-02 -2.0e-02

Descriptive statistics

There are some typical descriptive statistics you will find in most population genetics papers. These summary statistics give us an overview of some important features of populations under study.

We will focus on the basic ones describing variation within populations and describing variation between populations.

Note: many of these can be calculated over all populations as well as per population or between pairs of populations. This offers information at different scales.


Heterozygosity

If we are interested in genetic variation in natural populations we often consider heterozygosity.

High heterozygosity means a lot of genetic variability.

Low heterozygosity means little genetic variability.


Let’s consider heterozygosity in a simple system with two alleles at a locus: A and a. Let’s also assume that this population is in HWE.

Then:

\(p = f(A)\)

\(q = f(a)\)

So under HWE, we obtain the following mating table:

        A        a
  A     p^2      pq
  a     qp       q^2

So \(pq + qp = 2pq\) gives the frequency of heterozygote genotypes.

In this two-allele system, heterozygosity is highest at \(p = 0.5\). Let’s visualize:

freqP <- seq(from = 0, to = 1, by = 0.01)
freqQ <- 1 - freqP

plot(2*freqP*freqQ ~ freqQ, type = "l",
     ylab = "frequency of heterozygote genotypes",
     xlab = "frequency of allele 'a'")
text(0, 0, "AA")
text(1, 0, "aa")
arrows(0.45, 0, 0.05, 0)
text(0.5, 0, "Aa")
arrows(0.55, 0, 0.95, 0)

As you can imagine, this becomes more complex when there are more alleles per locus!

Questions?


Now we will calculate the observed heterozygosity in our populations, as well as the expected heterozygosity if the populations are in HWE.

For this we use the summary() function from adegenet. We can extract the observed heterozygosity using $Hobs and the expected heterozygosity using $Hexp.

To make life convenient, you can extract the values for every locus, place them in a data.frame and compute the difference:

heteroz <- data.frame(Hexp = summary(myData)$Hexp, Hobs = summary(myData)$Hobs)
heteroz$diff <- heteroz$Hexp - heteroz$Hobs
heteroz

We can also visualize how observed heterozygosity is different from expected heterozygosity by subtracting one from the other.

If you remember from before, we saw that loci were not in HWE over all populations.

barplot(heteroz$diff, names.arg = rownames(heteroz),
        main = "Heterozygosity: expected-observed",
        xlab = "", ylab = "Hexp - Hobs", font.lab = 2, las = 2)

Or we can use another representation, using ggplot2 for a change:

heteroz$loci <- rownames(heteroz) ## ggplot needs names stored as a column

ggplot(heteroz, aes(y = Hexp, x = Hobs)) +
  geom_segment(aes(y = Hexp - 0.01, yend = Hobs, xend = Hobs), linetype = "dashed") +
  geom_text(aes(label = loci), size = 3) +
  geom_abline(slope = 1) + 
  scale_x_continuous(limits = range(c(heteroz$Hobs, heteroz$Hexp))) +
  scale_y_continuous(limits = range(c(heteroz$Hobs, heteroz$Hexp))) +
  labs(title = "Heterozygosity: expected vs observed") +
  xlab(expression(bold("Observed heterozygosity"))) +
  ylab(expression(bold("Expected heterozygosity"))) +
  theme_classic() +
  coord_fixed()


Now let’s consider observed and expected heterozygosity by population. Here, we will focus on the mean across loci.

adegenet conveniently provides us with a function to calculate this for expected heterozygosity with Hs().

Hs(myData)
##      Pop1      Pop2      Pop3      Pop4      Pop5      Pop6 
## 0.7038393 0.7051239 0.5183017 0.5654010 0.6011010 0.5996094

To obtain observed heterozygosity there is no short cut (yet), so we need to use lapply() and seppop() again. Here we will calculate the mean of $Hobs per population in myData. As the output of seppop is a list, we also use unlist() to get a simple vector.

Hobs <- lapply(seppop(myData), function(x) mean(summary(x)$Hobs))
Hobs <- unlist(Hobs)
Hobs
##      Pop1      Pop2      Pop3      Pop4      Pop5      Pop6 
## 0.6964286 0.6751583 0.5105442 0.5054187 0.5796228 0.5982143

Again, we can store the results for observed and expected heterozygosity into a data.frame and compute the difference:

heteroz_per_pop <- data.frame(Hexp = Hs(myData), Hobs = Hobs)
heteroz_per_pop$diff <- heteroz_per_pop$Hexp - heteroz_per_pop$Hobs
heteroz_per_pop

We have already tested if these are significantly different above (while testing for the HWE).


Geeky note:

To look at both the effect of the locus and the population at once, you can write some custom code:

heteroz_matrix <- do.call("cbind", lapply(seppop(myData),
                                          function(x) summary(x)$Hexp - summary(x)$Hobs))
heteroz_matrix
##            Pop1         Pop2         Pop3        Pop4        Pop5         Pop6
## INRA63 -0.00625  0.175555556  0.055555556  0.10000000  0.09814050  0.029296875
## INRA5   0.18500 -0.006666667  0.157777778 -0.04388889  0.16632231 -0.005859375
## ETH225 -0.08875  0.128888889 -0.180000000 -0.01111111  0.09637188 -0.078125000
## HEL5    0.14750  0.293697979  0.080000000  0.05111111 -0.03822314 -0.097656250
## HEL1   -0.08375 -0.031111111 -0.055555556  0.18668252 -0.06301653  0.015625000
## INRA35 -0.04000  0.132777778  0.191111111  0.05053508  0.06508264  0.134765625
## ETH152  0.07250 -0.002222222  0.221938776 -0.09833333 -0.01859504  0.121093750
## INRA23  0.07375 -0.058673469 -0.068888889 -0.02913199  0.12603306  0.214843750
## ETH10  -0.08875 -0.097777778 -0.068888889  0.24333333  0.13326446 -0.078125000
## HEL9   -0.05500  0.033888889 -0.006666667  0.22833333 -0.07541322  0.066406250
## CSSM66 -0.06375 -0.042777778  0.124444444  0.10555556 -0.03719008 -0.001953125
## INRA32  0.17750  0.026159334 -0.002222222  0.10611111  0.07954545 -0.169921875
## ETH3   -0.05625 -0.028333333 -0.197777778 -0.07277778 -0.08390023 -0.109375000
## BM2113 -0.07000 -0.103888889 -0.142222222  0.02333333 -0.14772727 -0.021484375

This can be easily plotted with lattice:

levelplot(heteroz_matrix, col.regions = viridis(100),
          main = "Heterozygosity: expected-observed",
          xlab = "Locus", ylab = "Population")


F-statistics

An effect of population subdivision on genetic variation is the reduction in observed heterozygotes. As we have seen previously (Wahlund effect).

The extent of reduction in observed heterozygotes can be used to quantify the level of differentiation between sub-populations.

This quantification is formalised in a series of hierarchical F-statistics.

–> We are using a measure of departure from HWE to quantify the extent of differentiation between populations.


F-statistics also provide a way to quantify the level of heterozygosity at the individual level, and if/how this departs from expectations of HWE in the population.


How are these linked?

  • I = individual
  • S = sub-population
  • T = total-population

We will focus on the two most common F-statistics: \(F_{IS}\) and \(F_{ST}\).

  • \(F_{IS}\) is known as the inbreeding coefficient.

    • this essentially measures departures from random-mating.
    • how does heterozygosity in individuals compare to that in the subpopulation?
    • \(F_{IS} = (H_S - H_I)/H_S\)
  • \(F_{ST}\) is known as the fixation index.

    • measures the extent of genetic differentiation among subpopulations.
    • how is heterozygosity in the subpopulation reduced relative to the total population.
    • \(F_{ST} = (H_T - H_S)/H_T\)

Let’s calculate \(F_{ST}\) for our previous example.

             AA    Aa    aa
      Pop1   50     0     0
      Pop2    0     0    50

Here, \(p = 0.5\) and \(q = 0.5\).

Our expected heterozygosity for the total population (i.e., Pop1 and Pop2 together) is given by

\(H_T = 2pq = 2 \times 0.5 \times 0.5 = 0.5\)

As we have no variation in the sub-populations (i.e., Pop1 or Pop2) \(H_S = 0\) because \(2pq = 2 \times 1 \times 0\)

Then

\(F_{ST} = (0.5 - 0) / 0.5 = 1.0\)


Note: this has been a very simple exploration of how F-statistics are calculated. This has been expanded upon, and not all software or R packages calculate F-stats in the same way. For example, some account for factors such as how individuals disperse (island model vs stepping-stone model), the mutation process (infinite alleles model vs step-wise mutation model) and other bias (e.g. taking into account sampling bias). It is thus important to report (and if possible understand) which method you use:

To cite a package:

citation(package = "pegas")
## 
## To cite pegas in a publication use:
## 
##   Paradis E. 2010. pegas: an R package for population genetics with an integrated-modular approach. Bioinformatics 26: 419-420.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {pegas: an {R} package for population genetics with an integrated--modular approach},
##     author = {E. Paradis},
##     journal = {Bioinformatics},
##     year = {2010},
##     volume = {26},
##     pages = {419-420},
##   }
## 
## As pegas is evolving quickly, you may want to cite also its version number (found with 'library(help = pegas)').
packageVersion("pegas") ## don't forget to cite the version number!
## [1] '0.14'

Now let’s get back to myData.

We can use the Fst() function of the pegas package to calculate F-stats by locus, over all populations. But it requires the loci format, so we need to use as.loci() to transform our data.

Fst(as.loci(myData))
##               Fit        Fst         Fis
## INRA63 0.33802078 0.20747328  0.16472316
## INRA5  0.18137553 0.04798419  0.14011462
## ETH225 0.11686741 0.09742513  0.02154091
## HEL5   0.21269341 0.07811146  0.14598506
## HEL1   0.31398545 0.28907109  0.03504480
## INRA35 0.29918890 0.07347563  0.24361288
## ETH152 0.21049163 0.15643982  0.06407582
## INRA23 0.17236638 0.10827954  0.07186876
## ETH10  0.21569817 0.16348381  0.06241883
## HEL9   0.24349243 0.16970350  0.08887058
## CSSM66 0.09195957 0.05406492  0.04006051
## INRA32 0.26891906 0.17504229  0.11379585
## ETH3   0.04997358 0.13763044 -0.10164651
## BM2113 0.06567225 0.12927368 -0.07304411

How would you interpret these values?

We can also build a data.frame with all the information we need:

data.frame(Hobs = summary(myData)$Hobs, Hexp = summary(myData)$Hexp, Fst(as.loci(myData))[, c("Fst", "Fis")])

Often we are also interested in specific \(F_{ST}\) values between pairs of populations. Genetic differentiation among all populations does not need to be equal.

Let’s look at \(F_{ST}\) between pops 1 and 2. We will extract the first two populations from myData and turn this into loci format using as.loci().

myData_pop12 <- as.loci(myData[pop = c("Pop1", "Pop2")])

Now we estimate \(F_{ST}\) over all loci, obtained by using mean(), and constrain ourselves to the column of the output with the \(F_{ST}\) values:

mean(Fst(myData_pop12)[, "Fst"])
## [1] 0.04939851

Now we have a measure of differentiation between Pop1 and Pop2.

This pairwise \(F_{ST}\) value is the mean of \(F_{ST}\) values of all loci. We can check \(F_{ST}\) for every locus using Fst(myData_pop12)[, "Fst"].

As any estimate, the \(F_{ST}\) is measured with some uncertainty. To represent this uncertainty, we can compute the 95\(\%\) confidence interval of the mean \(F_{ST}\) value using a “very simple” bootstrap (more advanced implementations are possible):

Fst_boot <- replicate(100, {
  myData_pop12 <- myData_pop12[, c(1, sample(attr(myData_pop12, "loci"), replace = TRUE)[-1])]
  mean(Fst(myData_pop12)[, "Fst"])
})
quantile(Fst_boot, c(0.025, 0.975))
##       2.5%      97.5% 
## 0.02409615 0.07172776

Let’s plot the distribution of \(F_{ST}\) we just obtained:

hist(Fst_boot, xlim = c(0, 0.2))
abline(v = 0, col = 2, lwd = 2, lty = 2)

Geeky note:

For now, there is no simple solution to compute pairwise \(F_{ST}\) in R, but here is a solution using a home made function we prepared for you:

pairwise_F(myData, confint = FALSE) ## without Confidence Interval
##            Pop1       Pop2       Pop3       Pop4       Pop5      Pop6
## Pop1         NA 0.04939851 0.14266348 0.10909326 0.07058474 0.1335223
## Pop2 0.04939851         NA 0.18210328 0.15354673 0.13306532 0.1896989
## Pop3 0.14266348 0.18210328         NA 0.09044349 0.06150365 0.2385422
## Pop4 0.10909326 0.15354673 0.09044349         NA 0.03950569 0.2062700
## Pop5 0.07058474 0.13306532 0.06150365 0.03950569         NA 0.1627551
## Pop6 0.13352231 0.18969892 0.23854224 0.20626996 0.16275506        NA

Let’s do the same thing while estimating the amount of uncertainty:

pairwise_Fst <- pairwise_F(myData, confint = TRUE) ## with CI
lapply(pairwise_Fst, round, digit = 2) ## output after rounding with 2 digits
## $mean
##      Pop1 Pop2 Pop3 Pop4 Pop5 Pop6
## Pop1   NA 0.05 0.14 0.11 0.07 0.13
## Pop2 0.05   NA 0.18 0.15 0.13 0.19
## Pop3 0.14 0.18   NA 0.09 0.06 0.24
## Pop4 0.11 0.15 0.09   NA 0.04 0.21
## Pop5 0.07 0.13 0.06 0.04   NA 0.16
## Pop6 0.13 0.19 0.24 0.21 0.16   NA
## 
## $upr
##      Pop1 Pop2 Pop3 Pop4 Pop5 Pop6
## Pop1   NA 0.07 0.17 0.14 0.10 0.19
## Pop2 0.07   NA 0.24 0.18 0.17 0.24
## Pop3 0.17 0.24   NA 0.11 0.08 0.30
## Pop4 0.14 0.18 0.11   NA 0.06 0.26
## Pop5 0.10 0.17 0.08 0.06   NA 0.22
## Pop6 0.19 0.24 0.30 0.26 0.22   NA
## 
## $lwr
##      Pop1 Pop2 Pop3 Pop4 Pop5 Pop6
## Pop1   NA 0.03 0.10 0.08 0.05 0.08
## Pop2 0.03   NA 0.12 0.13 0.10 0.15
## Pop3 0.10 0.12   NA 0.07 0.04 0.17
## Pop4 0.08 0.13 0.07   NA 0.02 0.14
## Pop5 0.05 0.10 0.04 0.02   NA 0.09
## Pop6 0.08 0.15 0.17 0.14 0.09   NA

We can plot this output:

levelplot(pairwise_Fst$mean, col.regions = rev(grey.colors(30)))


Private alleles

It is worth checking if your populations have alleles that cannot be found in other populations. These are called “private alleles.”

We can use the private_alleles() function of poppr to get information about which alleles can only be found in a population.

Alleles are labelled as locus.allele.

private_alleles(myData)
##      INRA63.171 INRA5.149 ETH225.141 ETH225.137 HEL5.149 HEL5.159 INRA35.108 INRA35.106 ETH152.205 INRA23.203 INRA23.217 INRA23.193 ETH10.223 ETH10.213
## Pop1          0         0          1          0        2        0          0          0          0          0          0          0         0         0
## Pop2          1         1          0          0        0        0          3          0          0          2          0          0         0         0
## Pop3          0         0          0          0        0        0          0          0          0          0          1          0         0         0
## Pop4          0         0          0          1        0        0          0          1          0          0          0          0         1         0
## Pop5          0         0          0          0        0        0          0          0          0          0          0          1         0         0
## Pop6          0         0          0          0        0        1          0          0          1          0          0          0         0         1
##      HEL9.149 HEL9.151 CSSM66.171 CSSM66.191 INRA32.202 INRA32.164 INRA32.160 INRA32.168 ETH3.103 ETH3.113 BM2113.124 BM2113.150 BM2113.128 BM2113.132
## Pop1        0        0          0          0          1          1          0          0        1        0          0          0          0          0
## Pop2        1        0          0          0          0          0          1          0        0        0          0          0          0          0
## Pop3        0        0          0          0          0          0          0          0        0        0          9          0          0          0
## Pop4        0        0          0          0          0          0          0          0        0        0          0          0          0          0
## Pop5        0        1          1          0          0          0          0          0        0        1          0          1          0          0
## Pop6        0        0          0          1          0          0          0          1        0        0          0          0          5         12
##      BM2113.126 BM2113.144
## Pop1          0          0
## Pop2          0          0
## Pop3          0          0
## Pop4          0          0
## Pop5          0          0
## Pop6          2          2

We can summarise the information for each population:

private_alleles_per_pop <- apply(private_alleles(myData), 1, sum)
private_alleles_per_pop
## Pop1 Pop2 Pop3 Pop4 Pop5 Pop6 
##    6    9   10    3    5   26

Private alleles will impact measures of differentiation between populations. This is especially true, if their frequency is high (i.e., if they are common).

Note that the number of private alleles may be influence by many factors, including sample size.

What would you expect the relationship to be?

Let’s explore this using a plot:

ind_per_pop <- sapply(seppop(myData), nInd)
plot(ind_per_pop, private_alleles_per_pop,
     xlab = "Sample size", ylab = "Number of private alleles", las = 1,
     col = NULL)
text(y = private_alleles_per_pop,
     x = ind_per_pop,
     labels = names(ind_per_pop))


Trees & principal component analysis

Individual-based analyses frequently do not make assumptions about HWE, because they look at the differences between individuals.

But consider how this kind of information must be presented.

individual x individual

It is not surprising that we have come up with ways to visualize this kind of information.

Two very common ways are trees (or networks) and PCA.

One benefit of displaying information in this manner is that we can see if there are groupings of individuals that are more similar to eachother than they are to other individuals.


Proportion of shared alleles

A simple measure of genetic distance is the proportion of shared alleles. This measure does not make assumptions about mutation, genetic drift, etc.

It is, essentially, a measure of “how similar are two genotypes”, with a value ranging from 1 (= identical) to 0 (= no shared alleles).

adegenet has a function to calculate this: propShared().

The output of this function is a matrix with dimensions: no.of.samples x .no.of.samples. This is a bit too much to show on the screen if we consider the whole dataset of 133 samples!

Let us get the proportion of shared alleles among samples in Pop1 and display the information for the first 5 individuals:

similarity_mat <- propShared(myData[pop = "Pop1"])
similarity_mat[1:5, 1:5]
##             AFBIBOR9503 AFBIBOR9504 AFBIBOR9505 AFBIBOR9506 AFBIBOR9507
## AFBIBOR9503   1.0000000   0.4642857   0.2857143   0.5357143   0.4285714
## AFBIBOR9504   0.4642857   1.0000000   0.4285714   0.4642857   0.4285714
## AFBIBOR9505   0.2857143   0.4285714   1.0000000   0.3214286   0.4642857
## AFBIBOR9506   0.5357143   0.4642857   0.3214286   1.0000000   0.3928571
## AFBIBOR9507   0.4285714   0.4285714   0.4642857   0.3928571   1.0000000

The function nj() of the ape package can create a neighbour joining (NJ) tree from a distance matrix.

We can use the similarity matrix we just computed to obtain a distance matrix and run the neighbour joining:

distance_mat <- 1 - similarity_mat
mynj <- nj(distance_mat)
mynj
## 
## Phylogenetic tree with 20 tips and 18 internal nodes.
## 
## Tip labels:
##   AFBIBOR9503, AFBIBOR9504, AFBIBOR9505, AFBIBOR9506, AFBIBOR9507, AFBIBOR9508, ...
## 
## Unrooted; includes branch lengths.

We have reconstructed a NJ tree based on the distance matrix. But we did not plot it!

For this we need to simply use the plot() function, which can handle the output of nj() (or rather, nj() was coded so that we can use plot()). We have several options for the type of tree to be plotted using type =, where we will set unrooted.

plot(mynj, type = "unrooted")

Other distance measures

adegenet and poppr offer us the ability to generate other distance measures. Most of the ones offered in poppr work for individuals (rather than populations).

Note: several of these distance measures make strong assumptions about the biological nature of our genetic samples.

Let’s try another measure that makes no assumptions: Prevosti’s distance, which we can use with prevosti.dist()

mynj_prevosti <- nj(prevosti.dist(seppop(myData)[["Pop1"]]))
plot(mynj_prevosti, type = "unrooted")

Note: we did not need to use 1 - in this case, because the function prevosti.dist() directly provide a distance.

Let us compare the two trees we just made to see if they match:

cophyloplot(mynj, mynj_prevosti,
            assoc =  cbind(mynj$tip.label, mynj_prevosti$tip.label))

Are you happy with that?

Now that we know how to make a NJ tree for a small dataset, we can do it for all samples in myData.

bignj <- nj(prevosti.dist(myData))
plot(bignj, type = "unrooted")

Geeky note: we can improve this plot by plotting colours and changing the type of tree. Here a possibility would be a “fan” plot:

plot(bignj, type = "fan", show.tip.label = FALSE, x.lim = c(-0.7, 0.7), no.margin = TRUE)
tiplabels(text = rownames(myData@tab),
          frame = "none",
          col = rainbow(nPop(myData))[as.numeric(myData@pop)], cex = 0.8, offset = 0.05)
legend("topleft", fill = rainbow(nPop(myData)),
       legend = popNames(myData), bty = "n",
       title = "Population")

Or perhaps a “radial” plot:

plot(bignj, type = "radial", show.tip.label = FALSE, x.lim = c(-0.7, 0.7), no.margin = TRUE)
tiplabels(text = rownames(myData@tab),
          frame = "none",
          col = rainbow(nPop(myData))[as.numeric(myData@pop)], cex = 0.8, offset = 0.05)
legend("topleft", fill = rainbow(nPop(myData)),
       legend = popNames(myData), bty = "n",
       title = "Population")


PCA (Principal Component Analysis)

An alternative to neighbour joining is to reduce the dimensionality of the problem so that it can be plotted in 2 dimensions instead of one dimension per locus. Many options exist but the most common one is the so-called Principal Component Analysis, or PCA for short.

This is how you run a PCA:

myData_matrix <- scaleGen(myData, center = FALSE, scale = FALSE, NA.method = "mean")
mypca <- dudi.pca(myData_matrix, center = TRUE, scale = FALSE, scannf = FALSE, nf = Inf)

What does the PCA do?

The PCA creates new dimensions…

head(mypca$li[, 1:4]) ## only show head for first 4 axes

which are uncorrelated…

head(zapsmall(cor(mypca$li))[, 1:4])  ## only show head for first 4 axes
##       Axis1 Axis2 Axis3 Axis4
## Axis1     1     0     0     0
## Axis2     0     1     0     0
## Axis3     0     0     1     0
## Axis4     0     0     0     1
## Axis5     0     0     0     0
## Axis6     0     0     0     0

and which capture a decreasing amount of variation of the original loci:

barplot(mypca$eig,
        names.arg = colnames(mypca$li),
        cex.names = 0.5,
        col = heat.colors(length(mypca$eig)),
        las = 2, ylab = "Inertia")

or in cumulated percentage:

barplot(cumsum(100*mypca$eig/sum(mypca$eig)),
        names.arg = colnames(mypca$li),
        cex.names = 0.5,
        col = rev(heat.colors(length(mypca$eig))),
        las = 2, log = "y",
        ylab = "Cumulative proportion of variance explained")

Plotting the PCA

There are many ways to plot a PCA, but here we are interested in projecting the individuals into the new loci space, so we use the function s.class():

s.class(mypca$li, fac = pop(myData),
        col = rainbow(nPop(myData)), grid = FALSE, xax = 1, yax = 2, cpoint = 0)
#s.label(mypca$li, add.plot = TRUE, boxes = FALSE, clabel = 0.5)
add.scatter.eig(mypca$eig[1:10], xax = 1, yax = 2, ratio = 0.15)


DAPC (Discriminant Analysis of Principal Components)

The PCA does not force the group to be different, it just shows the overall structure. In contrast, the DAPC is a method that allows to explore whether some combinations of alleles would allows the characterisation of distinct groups.

Here is an example:

myclusters <- find.clusters(myData, n.clust = 3, n.pca = Inf) ## better not fixing n.clust and selecting it interactivelly!!!
dapc1 <- dapc(myData, pop = myclusters$grp, n.pca = Inf, n.da = Inf)
scatter(dapc1)

So yes, it seems that it is possible to create groups… but it is not immediatly clear what these groups correspond to in this dataset.

So let’s display the original populations on top of the plot:

s.class(dapc1$ind.coord, fac = pop(myData),
        col = rainbow(nPop(myData)), grid = FALSE, xax = 1, yax = 2, cpoint = 1)

Another type of visual representation is to mimic that one produced by the famous program Structure:

compoplot(dapc1, show.lab = TRUE, legend = FALSE, cex.names = 0.4,
          lab = paste(pop(myData), rownames(dapc1$tab)))

THE END